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Abstract 


The  focus  of  this  research  was  on  the  implementation  of  a  forward  kinematic  algorithm  for 
the  Utah  MIT  Dexterous  Hand  (UMDH).  Specifically,  the  algorithm  was  synthesized  from 
mathematical  models  onto  a  Field  Programmable  Gate  Array  (FPGA)  processor.  This  approach  is 
different  from  the  classical,  general-purpose  microprocessor  design  where  all  robotic  controller 
fimctions  including  forward  kinematics  are  executed  seriaUy  from  a  compiled  programming 
language  such  as  C.  The  compiled  code  and  subsequent  real-time  operating  system  must  be 
stored  on  some  form  of  nonvolatile  memory,  typically  magnetic  media  such  as  a  fixed  or  hard  disk 
drive,  along  with  other  con:q)uter  hardware  components  to  allow  the  user  to  load  and  execute  the 
software.  With  a  futiure  goal  of  moving  the  controllers  to  a  portable  platfi)rm  like  a  dexterous 
prosthetic  hand  for  amputee  patients,  the  application  of  such  a  hardware  implementation  is 
impossible. 

Instead,  this  research  e5q)lores  a  different  implementation  based  on  a  modular  approach  of 
dedicated  hardware  controllers.  The  controller  for  the  forward  kinematics  of  the  UMDH  is  used 
as  a  test  case.  The  resulting  FPGA  processor  replaces  a  robotic  system’s  burden  of  repetitive  and 
discrete  software  system  calls  with  a  stand-alone  hardware  interface  that  appears  more  like  a 
single  hardware  function  call.  The  robotic  system  is  free  to  tackle  other  tasks  while  the  FPGA 
processor  is  busy  computing  the  results  of  the  algorithm. 
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The  forward  kinematic  algorithm  for  the  UMDH  was  chosen  as  test  case  due  to  its 
familiarity  among  the  academic  community.  Although  considerable  time  was  spent  deriving  the 
equations,  the  specifics  of  the  UMDH  algorithm  itself  was  not  the  focus  of  this  thesis.  Rather,  the 
focus  was  on  the  implementation  of  such  an  extensive  and  complex  algorithm  onto  an  FPGA 
processor.  Forward  kinematic  algorithms  from  other  portable  robotic  devices  such  as  planetary 
rovers,  flight  line  bomb  loaders,  or  teleoperation  systems  could  have  been  implemented  just  as 
well. 

This  thesis  is  divided  into  three  parts.  First,  the  UMDH  is  examined  and  the  forward 
kinematic  equations  for  it  are  developed.  This  stage  will  be  different  for  every  robotic  system,  but 
the  process  will  remain  the  same.  Second,  the  resulting  equations  are  evaluated  for  maximum  and 
minimum  numeric  ranges  and  amounts  of  desired  precision.  This  information  is  used  in  the  third 
part,  where  mathematical,  memory  storage,  and  controller  functional  vmits  are  developed. 
Specifically,  VHDL  models  are  created,  simulated,  synthesized,  and  placed  into  an  FPGA 


processor. 
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1.  Introduction 


1.1  Background 

Although  robotic  devices  have  been  in  existence  for  many  years,  they  were  hindered  due 
to  the  high  computational  demands  until  the  digital  computer  revolution  came  about.  Today, 
highly  sophisticated  control  algorithms  are  written  in  software,  usuaUy  >vith  a  real  time  operating 
system  such  as  Chimera(Khosla),  VX-Works(Wind),  or  Condor(Narasimhan)  and  executing  on  a 
VME  based  processor  or  similar  dedicated  hardware  platform.  Each  part  of  the  algorithm  may  be 
executing  concurrently  with  other  parts  and  may  be  highly  repetitive  in  nature. 

One  particular  part  that  is  highly  repetitive  is  the  calculation  of  the  forward  kinematics  of 
the  device.  The  forward  kinematics  allow  the  angles  of  the  device  to  be  transformed  to  the  spatial 
position  and  orientation  of  the  end  of  the  device.  Even  a  small  motion  at  the  base  of  the  device 
may  cause  considerable  motion  farther  out  on  the  tip  of  the  device,  so  the  transform  must  be 
calculated  repetively  in  order  to  keep  track  of  the  device  in  Cartesian  coordinates. 

1.2  Problem  Statement 

The  forward  kinematics  of  the  Utah  MIT  Dexterous  Hand  (UMDH)  (Sarcos)  will  be 
developed  and  implemented  on  a  Xilinx  Field  Programmable  Gate  Array  (FPGA)  (Xilinx).  The 
result  is  a  Forward  Kinematic  Processor  for  the  UMDH  that  will  autonomously  calculate  the 
results  while  the  surrounding  system  performs  more  task  specific  operations. 
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1.3  Assumptions 

Although  the  process  used  to  calculate  the  forward  kinematics  is  the  same  for  most 
common  robotic  devices,  there  could  exist  a  device  or  devices  which  would  not  easily  map  to  the 
algorithms  discussed.  On  example  is  a  parallel  linkage  device  like  a  bomb  loader.  It  is  assumed 
that  the  developed  algorithm  is  for  the  UMDH  specifically  and  that  aD  UMDHs  are  mechamcally 
identical. 

1.4  Approach 

The  design  of  the  Forward  Kinematic  Processor  starts  with  the  development  of  the 
forward  kinematic  algorithm  specifically  for  the  UMDH.  This  algorithm  is  evaluated  for 
arithmetic  and  transcendental  properties  and  arranged  such  that  a  minimum  amount  of  hardware 
time  is  required.  The  required  arithmetic  and  transcendental  operations  lead  to  the  development 
of  functional  units  to  process  the  numeric  data.  The  functional  units  are  then  integrated  into  one 
complete  processing  unit,  and  synthesized  fi'om  VHDL  code  to  logic  blocks  on  a  Xilinx  FPGA. 

1.5  Overview 

The  remaining  chapters  of  this  document  describe  the  development  and  in^lementation  of 
the  Forward  Kinematic  Processor.  Chapter  2  reviews  the  mathematical  foundation  of  general 
forward  kinematics  and  applies  it  to  the  specific  nature  of  the  UMDH.  Chapter  3  looks  at  the 
results  of  Chapter  2,  particularly  the  equations  for  position  and  orientation,  and  evaluates  them  for 
magnitude  constraints,  required  precision,  and  operational  occurrences.  Chapter  4  describes  the 
development  of  a  VHDL  model  that  simulates  the  digital  hardware  implementation  of  an 
application  specific  microprocessor  that  can  compute  the  equations  fi'om  Chapter  2.  Chapter  5 


FPGA  Processor  Implementation  for  the  Forward  Kinematics  of  the  UMDH 


3 


deals  with  synthesizing  the  model  directly  to  an  Xilinx  FPGA.  Chapter  6  evaluates  the  results  and 
Chapter  7  discusses  recommendations  and  possible  future  work. 
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2«  Literature  Review  and  Background 


2.1  Review 

As  mentioned  in  Chapter  1 ,  a  typical  robotics  research  environment  consists  of  a  real  time 
operating  system  supported  by  a  relatively  large  hardware  platform.  The  use  of  such  a  system 
allows  researchers  to  quickly  change  various  parameters  of  the  control  structure  for  robotic 
devices.  Although  dedicated  hardware  may  show  an  increase  in  performance  for  a  particular 
application,  to  build  and  maintain  it  is  sometimes  too  much  overhead  for  researchers  whose 
primary  focus  is  robotics,  not  hardware  design  (Narasimhan). 

The  concept  of  a  dexterous  prosthetic  hand  requires  a  contoller  that  moves  with  the 
device.  Obviously,  a  generalized  hardware  platform  would  be  much  too  large  to  be  portable. 

Such  area  requirements  may  necessitate  a  custom  hardware  implementation  (Narasimhan).  With 
the  hopes  of  a  stand-alone  dexterous  prosthetic  hand  and  the  advent  and  popularity  of  the  FPGA, 
it  is  now  possible  to  merge  the  two  technologies  and  create  a  truly  portable  solution.  As  the 
controller  algorithms  in  the  research  laboratory  are  upgraded,  they  can  be  downloaded  into  the 
existing  hardware  of  the  hand  using  the  reconfigurable  properties  of  the  FPGA  (Xilinx). 

2.2  Introduction 

This  chapter  discusses  a  method  to  represent  the  mechanical  attributes  of  a  particular 
manipulator.  This  representation  is  then  used  to  determine  the  transformation  from  the  relative 
angles  of  each  link  to  the  3-dimensional  coordinate  locations  and  orientations  of  the  tip  of  the  end 
link.  The  process,  known  as  forward  kinematics,  is  then  applied  to  the  unique  nature  of  the  Utah 
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MIT  Dexterous  Hand  (UMDH).  Specifically,  the  thumb  mechanism  of  the  UMDH  is  evaluated 
and  the  resulting  control  equations  will  form  the  basis  for  FPGA  implementation  in  the  remaining 
chapters. 

2.3  Review  of  Forward  Kinematic  Computations  and  the  Denavit-Hartenberg 
Notation  (Craig) 

In  order  to  represent  the  mechanical  attributes  of  any  general  purpose  manipulator,  a 
convention  is  formulated  that  will  relate  the  various  physical  parts  that  make  up  the  manipulator. 
It  is  composed  of  rigid  links  connected  by  joints  to  allow  for  relative  motion  of  the  neighboring 
links.  Most  manipulators  have  joints  that  are  either  revolute  or  prismatic  as  shown  in  Figure  2.1. 
Revolute  joints  are  typical  hinge  style  joints  and  the  unit  of  measurement  is  the  joint  angle 
between  the  two  halves  of  the  joint.  Prismatic  joints  are  designed  such  that  one  half  can  slide 
back  and  forth  in  relation  to  the  fixed  half.  The  measuring  unit  is  the  joint  offset  between  the  two 
halves.  Other  possible  joint  configurations  include  cylindrical,  planar,  screw,  and  spherical 


(Craig:69). 

Link  0  is  considered  to  be  the  immobile  base  of  the  manipulator.  Link  1  is  the  first  moving 
part,  followed  by  link  2,  and  so  on  out  to  the  end  link  n.  The  axes  of  the  joints  which  connect  the 
links  are  measured  relative  to  the  previous  axis.  Each  joint  axis  defines  a  vector  in  which  the  next 
link  in  the  chain  will  rotate  about.  However,  the  link  and  its  previous  joint  are  given  the  same 
index.  This  vector  is  based  on  the  coordinate  fi-ame  of  the  previous  joint.  There  are  two 
quantities  to  measure  the  difference  between  the  two  axes  as  shown  in  Figure  2.2.  First,  the  fink 
length  ^i-i  is  the  distance  of  the  line  that  is  mutually  perpendicular  to  both  axes.  Second,  the  link 
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Cylindrical 


Screw 


Figure  2.1.  The  Six  Possible  Joints 

twist  ^-1  is  the  angle  between  the  i-1  axis  and  a  parallel  projection  of  the  axis  i  onto  the  origin 
point  of  the  perpendicular  line  found  earlier. 

For  links  that  have  a  common  joint  between  them,  there  are  two  quantities  that  can  be 

measured.  First,  the  link  offset  4  is  the  distance  between  the  connection  points  of  the  two  links 
along  the  axis  of  the  common  joint.  If  this  value  is  zero,  then  that  implies  a  door  like  hinge.  If  the 
value  is  non-zero,  then  that  implies  a  sort  of  scissors-like  hinge  where  the  two  links  use  the  same 


Figure  2.2.  Link  Length  and  Link  Twist 


value  is  non-zero,  then  that  implies  a  sort  of  scissors-like  hinge  where  the  two  links  use  the  same 
joint  but  are  slightly  offset  from  each  other.  Secondly,  the  joint  angle  ^  is  the  rotational 
difference  between  the  two  links  about  their  common  joint.  These  two  quantities  are  shown  in 
Figure  2.3.  If  the  joint  is  revolute,  then  the  link  offset  is  fixed  and  the  joint  angle  will  be  allowed 
to  vary.  Similarly,  if  the  joint  is  prismatic,  then  the  joint  angle  is  fixed  and  the  link  offset  is 
allowed  to  vary.  For  the  first  and  last  links,  the  fixed  quantity  will  be  set  to  zero  (Craig:73). 
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These  four  quantities,  link  length  link  twist  link  offset  4 ,  and  joint  angle 
allow  for  the  unique  description  of  any  common  manipulator.  Together,  they  form  a  convention 
known  as  the  Denavit-Hartenberg  notation  (Craig:74).  The  four  quantities  are  then  regularly 
placed  into  a  DH  table  containing  the  information  for  aU  degrees  of  freedom  of  the  manipulator 
(Craig:68-82;  Rattan:37-44). 

The  next  step  is  to  relate  the  frames  of  links  i  and  i-1 .  To  do  this,  three  intermediate 
frames  are  created  to  allow  the  transformation  form  one  link  to  the  next.  Figure  2.4  shows  the 


addition  of  these  three  frames,  denoted  R,  Q,  and  P  (Craig:83). 


Figure  2.4  Intermediate  Frames 


First,  the  R  frame  is  placed  at  the  same  origin  as  the  i-1  frame  but  rotated  about  the  x-axis 
by  the  link  twist  amount.  The  Q  frame  is  then  placed  in  the  same  orientation  as  P  but  it  is 
shifted  along  the  x-axis  by  the  link  length  amount  towards  the  next  link.  The  R  frame  is  then 
placed  at  the  same  origin  as  Q  but  rotated  by  the  z-axis  by  the  joint  angle  ^  amount.  Finally,  the 
frame  of  link  i  has  the  same  orientation  as  R  but  it  is  shifted  along  the  z-axis  by  the  link  offset  d, 
amount  towards  the  next  link  (Craig:83-84;Rattan:45-52). 
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Because  moving  from  i-1  to  R  is  a  rotation,  its  rotational  matrix  is  given  by  Equation  2.1. 
The  transformation  from  R  to  Q  is  given  by  the  positional  scaling  vector  of  Equation  2.2. 
Together,  Equations  2. 1  and  2.2  form  the  transformation  matrix  shown  in  Equation  2.3. 


Rotation  about  x-axis 


1  0  0 
0  oos(<2'_,)  -sm(<2j_i) 
0  sm(<2'.j)  cos  ((S'.!) 


Equation  2.1 


Scaling  along  x-asis  = 


a 


i-1 


0 

0 


Transform  (i-1  to  Q)  = 


1 

0 

0 

0 


0 


0 


0 


cos(t2;_i)  -sm(<2.^) 
sin(t2;Li)  cos(t3;_i) 


0 


Equation  2.2 


^i-1 


0 

0 

1 


■J  Equation  2.3 


Similarly,  moving  from  Q  to  P  is  a  rotation.  Its  rotational  matrix  is  given  by  equation  2.4. 
The  transformation  from  P  to  i  is  given  by  the  positional  scaling  vector  of  Equation  2.5. 
Together,  Equations  2.4  and  2.5  form  the  transformation  matrix  shown  in  Equation  2.6. 
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oos(^)  -sm(/^)  0 

sm(^)  cos(^)  0 


Rotation  about  z-axis 

= 

- 

0 

0  ij 

Equation  2.4 

o' 

0 

Scaling  along  z-axis 

A. 

Equation  2.5 

cos( 

-sinCi?) 

0 

0“ 

cos(  if.) 

0 

0 

0 

0 

1 

Transform  (Q  to  i) 

_ 

0 

0 

0 

1 

Equation  2.6 

The  complete  transformation  is  the  matrix  multiplication  of  Equations  2.3  and  2.6.  This  is 
the  transformation  from  the  i-1  to  the  i  link  and  is  shown  in  Equation  2.7. 


Transform  (i-1  to  i) 


oos(<:^)  -sm(^)  0 

sm((!5J)oos(<3;_i)  cos(^)oos(<^_j)  -sinCia'.i)  -sin(<3'_i)<5f. 

sm(/:?)sin(/;^_j)  cos(<!^)sin(c^_j)  cos(<2'_i)  cos(/3'_i)<i^ 

0  0  0  1 


Equation  2.7 

To  find  the  nth  frame,  simply  multiply  the  transforms  of  each  intermediate  frame  together 
as  in  Equation  2.8a.  Equation  2.8b  shows  the  final  transformation  matrix  from  0  to  n.  The  result 
is  a  4  by  4  matrix  that  represents  the  orientation  of  frame  n  with  respect  to  frame  0  and  the 
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location  of  the  last  link  with  respect  to  frame  0.  The  first  column  represents  the  normal  vector  N, 
the  second  column  represents  the  sliding  vector  S,  the  third  column  represents  the  approach 
vector  A,  and  the  fourth  column  represents  the  position  vector  P.  Due  to  the  nature  of  the  zeros 
and  ones  in  Equations  2.3  and  2.6,  the  fourth  row  will  always  be  [0  0  0  1]  (Craig:84-85; 
Rattan:53,  55). 


y  =(7X2 

T)en-CT) 

'K 

A 

A 

N 

p 

°aT  = 

y 

y 

y 

y 

4 

K 

S. 

A 

p. 

0 

0 

0 

1 

Equation  2.8a 


*-  "*  Equation  2.8b 

If  there  is  an  extension  from  the  last  joint,  such  as  a  tool  or  a  finger  tip  of  length  L  in  the 
case  of  the  UMDH,  the  orientation  is  the  same  as  the  joint  itself,  but  the  position  is  shifted  by  the 
amount  L  along  the  normal  vector  n  of  the  joint.  Equations  2.9, 2. 1 0,  and  2.11  show  the 
modification  to  the  position  vector  from  the  last  joint  to  get  the  new  position  vector  of  the  end  of 
the  extension  (Solanki  and  Rattan:72). 


P  =P  +N  L 

y  y  y 


Equation  2.9 


Equation  2.10 
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P  =  P  +  N  T 

^  Z  ^  z  z^  Equation  2.11 

2.4  UMDH  Forward  Kinematic  Computations 

The  UMDH  shown  in  figure  2.5  is  composed  of  three  fingers  and  a  thumb.  The  three 
fingers  are  kinematically  identical  ■with  the  exception  of  their  ofifeets  at  the  knuckle  locations.  The 
thumb  is  slightly  different  from  the  fingers  and  it  is  located  between  the  first  and  second  fingers  on 
the  palm  of  the  hand. 


Figure  2.5.  Utah  MIT  Dextrous  Hand 

Figures  2.6  and  2.7  show  the  top  and  side  view  of  the  UMDH  respectively  (Solanki  and 
Rattan:67-68).  Notice  how  the  0th  fi-ame  is  located  back  towards  the  wrist.  It  is  defined  at  this 
location  because  it  is  the  intersection  of  the  joint  axis  for  both  the  thumb  and  the  middle  finger. 
This  could  have  been  chosen  at  a  different  location  but  would  result  in  more  complicated 
transformation  matricies  (Solanki  and  Rattan;66). 
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Xl,XaX3,X4  <out>  Y1 


Figure  2.6.  Top  View  of  UMDH  (thumb  extends  out  of  page) 


Figure  2.7.  Side  View  of  UMDH 
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Because  the  three  fingers  and  the  thumb  are  almost  kinematically  identical,  only  one  will 
be  further  explored.  The  thumb  mechanism  alone  represents  a  serial  chain  manipulator  with  four 
degrees  of  fi'eedom  resulting  fi'om  the  four  revolute  joints.  The  DH  table  for  the  thumb  of  the 
UMDH  in  this  configuration  is  shown  in  Table  2.1  (Solanki  and  Rattan:69).  Using  these  values 
and  Equation  2.7,  each  link  relationship  can  be  calculated.  Replacing  the  i  and  i-1  variables  with 
the  fixed  quantities  fi-om  the  DH  table  results  in  much  simplified  versions  of  the  transformation 
matrices.  Equations  2.12  through  2.15  shows  each  intermediate  matrix  (Solanki  and  Rattan:70). 


Table  2.1.  DH  table  for  Thumb  of  UMDH 


i 

link  twist 

link  length 

link  offset 

joint  angle 

1 

0 

O 

II 

=  -0.75" 

3.125" 

2 

II 

o 

o 

=  0.375" 

o 

II 

3 

II 

o 

o 

=  1.700" 

o 

II 

4 

II 

o 

o 

II 

1— ‘ 

o 

o 

II 

o 

^4 

cos(^) 

-sm(^) 

0 

COS(i!!^) 

0 

0 

0 

0 

1 

d. 

0 

0 

0 

1 

Equation  2.12 

OOS{/^2)  ® 

0  0-10 

sin(/S^)  oos(i!S^)  0  0 

0  0  0  1 

oos(i^)  -sin(i!^)  0 
sin(/?3)  cos(i^^)  0  0 

0  0  10 
0  0  0  1 

oos(/!^)  -sm((!?4)  0 

oos(/:^)  0  0 

0  0  10 
0  0  0  1 


Equation  2.13 


Equation  2.14 


Equation  2.15 


These  four  transformation  matrices  are  concatenated  into  one  using  Equation  2.8.  The 


result,  after  consecutive  matrix  multiplications,  is  shown  in  Equation  2.16  (Solanki  and 


Rattan:?  1). 

cos(/!?)cos(/^’j+i^3+/^4)  -cos(^#'i)sm((^2+-^3+/^^4)  ao+cos(<^'X«i+«2  00s('^2)  +  «3®os(i^2  +  ‘^3)) 

sm(/^i)oos(^2+^3+^4)  -sm(-^i)sin(^,+^#'3+^?4)  -oos(^i)  sin(^iX«i +«2  00<'^2)+^oos(;^2 +^^3)) 

sin(^^'3+^3  +  <:^4)  cos(/^j  +  ^3  +  <!^'4)  0  fl3sm(^j)  +  a3sm(;!?3  +  ^3)  +  i?i 

0  0  0  1 


Equation  2.16 
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The  elements  within  the  matrix  of  Equation  2.16  are  one  to  one  equivalent  to  Equation 
2.8b.  The  resulting  twelve  equations  out  of  sixteen  (four  equations  are  a  constant  0  or  1)  can 
now  be  used  as  the  basis  for  the  remaining  chapters. 

2.5  Conclusions 

This  chapter  investigated  a  mathematical  method  for  the  calculation  of  the  forward 
kinematic  equations  of  the  thumb  mechanism  of  the  Utah  MIT  Dexterous  Hand.  The  resulting 
Equation  2. 1 6  =  2.8b  represents  the  locations  and  orientation  of  the  last  joint  of  the  UMDH.  It 
does  not  directly  give  the  location  of  the  tip  of  the  thumb.  It  will  require  the  application  of 
Equations  2.9  through  2. 1 1  to  derive  such  information  from  2. 16.  The  L  term  can  be  fixed  as  the 
length  of  the  last  link,  or  1 .3  inches  if  the  desired  answer  is  for  the  tip  of  the  thumb.  Other  L 
values  can  be  used  to  represent  tools  attached  to  the  tip.  Such  tools  might  be  force  or 
temperature  sensors.  The  remaining  chapters  will  deal  with  the  Equation  2.16  since  this 
represents  the  base  configurations  of  all  UMDHs. 
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3.  Algorithm  Analysis  and  Profiling 


3.1  Introduction 

Before  a  physical  computational  architecture  can  be  defined  for  implementation,  the 
twelve  equations  derived  in  Chapter  2  need  to  be  evaluated  in  the  context  of  the  desired 
performance  of  the  UMDH.  Only  those  hardware  components  that  are  absolutely  necessary  will 
be  implemented.  It  is  proposed  that  the  desired  forward  kinematic  processor  deals  only  with 
mathematical  operations  and  does  not  work  with  concepts  such  as  character  strings,  addressing 
modes,  or  conditional  branches  typically  found  in  a  general  purpose  microprocessor.  Therefore, 
this  chapter  deals  with  the  trade-offs  involved  in  finding  an  optimum  hardware  representation  for 
both  high  performance  and  low  hardware  overhead. 

3.2  Numeric  Magnitude 

The  first  metric  that  is  evaluated  is  the  notion  of  numeric  magnitude.  We  need  to  know 
the  highest  valued  (positive  or  negative)  number  that  is  ever  used  within  any  stage  in  the 
calculation  of  the  equation.  This  defines  the  amount  of  hardware  needed  to  hold  such  a  number. 

To  determine  such  a  niunber,  the  algorithm  was  written  in  the  C  language  as  a  procedure 
call  and  is  listed  in  Appendix  A.  The  procedure  is  called  by  the  main  routine  for  many  different 
UMDH  configurations.  Each  of  the  four  joints  of  the  UMDH  are  controlled  by  nested  FOR  loops 
which  cause  the  angles  to  sweep  through  each  joint’s  given  range  shown  in  Table  3.1  (Solanki  and 
Rattan:69).  The  results  of  the  equations  for  each  particular  configuration  were  written  to  a  data 
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file.  The  data  file  was  then  imported  into  the  Matlab  environment  and  searched  for  the  maximum 
and  miniTnum  values  as  listed  in  Appendix  A.  The  values  of  the  angles,  including  intermediate 
steps  where  up  to  three  angles  are  added  together,  show  that  they  never  exceed  the  range  +360  to 
-360  degrees.  Intermediate  additions,  subtractions,  and  multiplications  never  exceed  -2.3864  to 
+3.3750.  The  final  results  of  the  NSAP  matrbc  never  exceed  -2.3864  to  +5.6271. 


Table  3.1.  Kinematic  Range  of  UMDH 


Joint  Angle 

Range  of  motion  in  degrees 

-45  to  135 

4 

-15  to  60 

4 

6.5  to  90 

4 

0to90 

The  implementation  of  the  integer  portions  of  such  numbers  can  be  accomplished  directly 
with  just  four  bits  of  hardware  (three  bits  represent  the  integers  0  to  7  and  one  bit  for  the  sign). 
However,  since  the  values  obtained  are  just  a  sample  of  the  results  fi-om  entire  range  of  the 
UMDH,  and  not  an  exhaustive  test.  This  represents  the  minimum  hardware  size  required.  Also, 
the  future  expansion  to  another  type  of  manipulator  may  require  more  than  just  four  bits. 
Therefore,  at  least  four  bits  will  be  held  for  now  for  hardware  implementation.. 
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3.3  Numeric  Precision 

The  second  metric  used  is  the  numeric  precision  required  by  the  system.  The  UMDH  was 
designed  with  metal  joints  that  are  controlled  remotefy  via  a  set  of  tendons  running  around  plastic 
pulleys.  The  coulomb  friction  of  the  joints  and  pulleys  causes  a  motion  deadband  every  time  a 
joint  stops.  The  electronic  control  system  of  the  UMDH  attempts  to  track  the  desired  position  of 
each  joint,  but  it  is  limited  by  these  mechanical  properties.  Consequently,  simply  turning  up  the 
gains  of  the  UMDH  controller  would  not  suffice  because  that  causes  the  joints  to  become  unstable 
and  to  begin  oscillating. 

Therefore,  in  an  attempt  to  avoid  decreasing  performance  beyond  that  of  the  current 
system  and  to  avoid  possible  truncation  problems  at  intermediate  stages  in  the  equations,  the 
number  of  decimal  bits  required  is  set  to  eight.  This  allows  for  a  resolution  of 0.003906250  per 
least  significant  bit  since  the  last  bit  is  the  placeholder  for  2'*.  If  the  value  is  representative  of  an 
angle,  then  it  is  clear  that  0.003906250  degrees  is  much  higher  a  precision  than  the  UMDH  could 
ever  track.  If  the  value  represents  a  Cartesian  coordinate  of  the  end  of  the  finger,  then  the  same 
applies  to  0.003906250  inches.  Although  the  UMDH  was  modeled  as  an  ideal  body  of  rigid  links, 
all  devices  will  inherently  flex  to  some  extent. 

3.4  Mathematical  Operator  Usage 

The  12  equations  are  examined  for  occurrences  of  additions/subtractions,  multiplications, 
or  cosines/sines.  A  brute  force  approach  by  simply  counting  the  number  of  operations  found  in 
Equation  2.16  results  in  22  additions,  three  subtractions,  12  multiplications,  1 1  cosines,  and  nine 
sines.  However,  many  of  the  terms  in  the  12  equations  appear  in  more  than  one  location. 
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Therefore,  the  number  of  operations  can  be  reduced  by  sharing  these  terms.  Both  the  cosine  and 
sine  of  three  angles  are  used  three  separate  times.  Similarly,  the  entire  last  half  of  Px  and  Py  are 
identical.  If  the  order  of  calculation  for  the  12  equations  takes  advantage  of  the  common  terms 
then  the  number  of  operations  can  be  reduced  to  seven  additions,  three  subtractions,  10 
multiplications,  four  cosmes,  and  four  sines.  This  is  a  68.2%  decrease  in  additions,  16.6% 
decrease  in  mxxltiplication,  63.6%  decrease  in  cosines,  and  55.5%  decrease  in  sines.  The 
subtractions  remain  unchanged  because  of  the  negative  signs  on  Py,  Sx  and  Sy. 

3.5  Conclusions 

This  chapter  evaluated  the  equations  from  Chapter  2  to  determine  the  best  representation 
of  the  numbers.  We  determined  that  the  absolute  largest  number  only  required  four  bits  but  that 
more  bits  for  higher  numbers  may  be  required  in  future  implementations.  To  keep  the  precision  of 
each  number,  eight  bits  are  required  for  a  minimum  of  1 /256th  difference  between  each  number. 

Therefore,  the  implementation  of  the  numbers  in  hardware  are  done  with  a  total  of  eight 
bits  for  the  integer  portion  and  eight  bits  for  the  decimal  portion.  Together,  the  16  bits  form  the 
basis  for  a  fixed  point  number  with  the  binary  point  in  the  center  between  the  set  of  eight  bits. 

This  results  in  a  maximum  number  of +127.99609375  and  a  minimum  number  of -128.00000000. 

Finally,  we  determined  that  the  12  equations  can  be  calculated  in  just  28  operations  if 
common  terms  are  reused.  This  is  a  decrease  of  50.9%  from  the  original  57  operations. 


FPGA  Processor  Implementation  for  the  Forward  Kinematics  of  the  UMDH 


22 


4.  VHDL  Model 


4.1  Introduction 

This  chapter  discusses  the  first  step  in  the  implementation  of  the  forward  kinematic 
processor.  The  step  is  the  development  of  behavioral  VHDL  models  for  each  of  the  required 
mathematical  operations  found  in  Equation  2.16  as  well  as  temporary  register-based  memory  and 
other  structures  used  to  route  the  data  within  the  processor.  Finally,  a  structural  VHDL  model 
for  the  entire  processor  is  developed.  Each  model  is  developed  and  simulated  using  the  Synopsys 
Analyzer  and  Simulator  (Synopsys)  before  synthesis  in  Chapter  5. 

4.2  Functional  Units 

In  all  models,  the  16-bit  fixed-point  representation  of  all  numeric  data  will  be  implemented 
as  a  bit  vector  of  size  15  down  to  0.  The  binary  point  is  implied  to  be  at  the  center,  between  bits 
8  and  9. 

4.2,1  Cosine/Sine  Unit. 

The  first  fimctional  unit  developed  was  the  cosine  and  sine  unit.  Both  transcendental 
fimctions  are  designed  into  one  model  as  shown  in  Figure  4. 1 .  The  unit  calculated  the  cosine  or 
sine  by  means  of  an  external  lookup  table.  An  address  is  generated  and  sent  to  a  ROM  chip  that 
returns  the  result  back  to  the  cosine/sine  unit.  Since  the  specifications  of  the  external  ROM  chip 
were  not  known  at  the  start  of  the  design,  the  model  incorporated  the  ability  to  set  the  delay 
before  the  unit  latches  the  results  fi*om  the  ROM.  These  wait  states  allow  the  possibility  of  the 


use  of  slower  ROM  devices. 
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sel  go  reset  elk  wait  ready 


Figure  4.1.  Cosine/Sine  Unit  Block  Diagram 
For  example,  if  the  system  clock  of  the  forward  kinematic  processor  has  a  clock  period  of 
40  ns  (25  MHz)  and  the  ROM  device  has  an  access  time  of  only  150  ns,  then  the  number  of  wait 
states  would  be  set  to  three.  Three  wait  states  causes  three  extra  40  ns  clock  cycles  in  addition  to 
the  current  cycle,  for  a  total  of  4  cycles  or  160  ns.  This  prevents  the  cosine/sine  unit  from  reading 
in  incorrect  data  early. 

The  state  machine  is  shown  in  Figure  4.2.  A  reset  signal  during  any  state  will  force  the 
system  to  state  0.  In  state  0,  the  ready  output  signal  is  not  asserted,  the  number  of  wait  states  are 
calculated,  the  temporary  counter  is  set  to  zero  and  look-up  table  address  is  formed  and  sent  to 
the  external  ROM.  To  form  the  address,  the  unit  takes  as  input  a  16-bit  vector  and  strips  off  the 
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lower  1 1  bits,  representative  of  three  bits  of  integer  and  eight  bits  of  decimal.  Also,  the  highest 
bit,  representing  the  sign,  is  also  pulled  out.  Finally,  an  input  signal  called  sel,  that  determines 
cosine  or  sine,  is  also  taken  and  these  13  bits  form  the  address  into  the  ROM  lookup  table 
containing  the  results  of  both  cosine  and  sine. 


Figure  4.2.  Cosine/Sine  Unit  State  Machine 
The  unit  will  stay  in  state  0  xmtil  the  go  input  signal  is  asserted.  Once  in  state  1,  it  will  stay 
there,  incrementing  the  counter  until  it  matches  the  precalculated  number  of  wait  states.  It  will 
then  move  to  state  2  where  the  results  from  the  ROM  look-up  table  are  latched  into  the  output 
bus.  The  unit  then  transitions  to  state  3  at  the  next  rising  edge  of  the  clock  and  the  ready  output 
signal  is  asserted.  The  next  transition  on  the  rising  edge  of  the  clock  is  back  to  state  0,  where  it 


waits  for  the  next  cycle. 
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The  behavioral  VHDL  model  for  the  cosine/sine  model  is  listed  in  Appendk  B.  1 . 1 .  The 
VHDL  testbench  code  and  results  for  it  are  listed  in  Appendix  B.1.2.  The  testbench  sends  the 
unit  through  the  eight  possible  wait  states  with  a  simulated  external  ROM-  These  results  are 
shown  in  Appendk  B.  1 .3. 

4.2.2  Adder/Subtractor  Unit. 

The  adder  and  subtractor  are  contained  within  one  functional  unit.  The  subtractor  is 
implemented  using  the  adder  model  and  inverting  the  secondary  input  before  applying  it  to  the 
adder.  In  both  cases,  two  16-bit  numbers  are  input  into  the  unit  and  one  16-bit  number  is  output 
as  shown  in  Figure  4.3,  There  are  no  provisions  for  overflow  or  underflow  conditions  because  of 
the  nature  of  the  operands.  At  no  time  should  there  occur  an  overflow  or  underflow  condition. 

reset  ael  go  cQt  done 


Figure  4.3.  Adder/Subtractor  Unit  Block  Diagram 
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The  vinit  starts  at  idle  in  state  0  shown  m  Figure  4.4.  When  the  go  input  signal  is  asserted, 
the  unit  starts  by  calculating  the  sum  and  carry  terms  of  Equation  4.1  and  4.2  for  the  least 
significant  bits,  where  A  and  B  are  inputs  bits  and  C  is  the  cany  in  fi'om  the  previous  bit. 

(Weste  and  Eshraghian:517).  Each  clock  tick  causes  the  unit  to  progress  to  the  next  state  and 
calculate  the  next  bit.  After  sixteen  clock  ticks,  all  sums  have  been  calculated  and  the  result  is 
sent  to  the  output  bus.  A  done  output  signal  is  asserted  indicating  completion  and  the  state 
machine  returns  to  state  0  in  preparation  for  another  addition  or  subtraction. 


Figure  4.4.  Adder/Subtractor  Unit  State  Machine 

Equation  4.1 
Equation  4.2 


Carry  =  AB  +  C(A+B) 

Sum  =  ABC  +  (A+B+C)Carry 
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Typically,  an  adder/subtractor  would  not  be  implemented  as  a  state  machine  requiring  at 
least  16  clock  ticks.  However,  since  the  target  platform  is  an  FPGA,  and  the  timing  of  the 
synthesized  design  will  not  be  known  until  Chapter  5,  it  is  impossible  to  determine  how  long  it 
will  take  to  allow  all  the  sum  and  carry  terms  to  ripple  their  results  to  the  final  result.  Therefore, 
the  unit  indicates  to  the  surrounding  system  when  it  has  completed  the  final  state  by  asserting  the 
done  signal.  If  at  any  time  the  reset  signal  is  asserted,  the  unit  is  forced  back  to  state  0. 

The  behavioral  VHDL  model  for  the  adder/subtractor  model  is  listed  in  Appendix  B.2.1. 
The  VHDL  testbench  code  for  it  is  listed  in  Appendix  B.2.2.  The  testbench  sends  the  unit 
through  30  different  additions  and  30  different  subtractions.  These  results  are  shown  in  Appendbc 
B.2.3. 

4.2.3  Multiplier  Unit. 

The  multiplier  unit  has  the  same  data  interfece  as  the  adder/subtractor  unit.  Figure  4.5 
shows  the  two  16-bit  inputs  and  one  16-bit  result.  Once  again  there  are  no  provisions  for 
overflow  or  underflow.  Typically  two  16-bit  numbers  multiplied  together  would  result  in  a  32-bit 
result,  but  in  this  specific  implementation,  the  numbers  should  never  exceed  16-bits,  a  constraint 
of  the  16-bit  architecture. 

The  multiplier  actually  uses  a  modified  copy  of  the  adder/subtractor  inside  its  design.  The 
adder/subtractor  is  extended  to  32-bits  to  handle  the  accumulation  of  the  partial  products.  The 
multiplier  follows  the  same  basic  data  flow  as  the  adder/subtractor  except  that  it  requires  many 
more  states  to  calculate  the  result.  Figure  4.6  shows  the  state  machine  for  the  multiplier  unit. 
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Figure  4.5.  Multiplier  Unit  Block  Diagram 


Figure  4.6.  Multiplier  Unit  State  Machine 
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It  stays  idle  in  state  0  until  the  go  input  signal  is  asserted.  Each  of  16  partial  products  are 
calculated  and  then  repetitively  added  up  to  form  the  final  result.  Similar  to  the  adder/subtractor 
unit,  when  the  final  state  is  reached,  an  output  signal  ready  is  asserted  to  indicate  to  the 
surrounding  system  that  multiplication  is  complete.  If  at  any  time  the  reset  signal  is  asserted,  the 
unit  is  forced  back  to  state  0. 

The  behavioral  VHDL  model  for  the  multiplier  model  is  listed  in  Appendix  B.3.1.  The 
VHDL  testbench  code  for  it  is  listed  in  Appendix  B.3.2.  The  testbench  sends  the  unit  through  the 
same  30  inputs  as  the  adder/subtractor  but  multiplies  rather  than  adds  or  subtracts.  These  results 
are  shown  in  Appendix  B.3.3. 

4.2.4  Register  File  Unit. 

The  register  file  unit  is  used  to  store  the  starting  angles  of  the  UMDH,  certain  constants 
fi'om  the  DH  table,  temporary  and  intermediate  calculations,  and  the  12  equation  results.  It  is 
designed  to  hold  the  16-bit  numbers  in  any  of  32  different  locations,  except  for  the  first  two 
locations.  The  first  location  is  hard  wired  to  always  hold  a  zero  value  and  the  second  location 
holds  a  hard  wired  one  value.  This  was  designed  early  on  because  of  the  ejq)ected  need  to 
increment  by  one  or  to  allow  for  moves  fi'om  one  location  to  another  through  the  adder/subtractor 
unit  with  one  of  the  inputs  being  zero. 

It  is  designed  with  one  16-bit  input  bus  called  the  C  bus  and  two  16-bit  output  buses 
called  the  A  and  B  bus  as  shown  in  Figure  4.7.  The  data  of  the  C  bus  is  Avritten  to  any  of  the 
remaining  30  locations  by  use  of  the  C  bus  address  and  a  latch  signal.  Data  can  be  read  fi'om  any 
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of  the  32  locations  to  both  A  and  B  bus  by  using  the  A  and  B  address.  If  the  reset  signal  is 
asserted,  the  30  locations  are  forced  to  zero. 

C  address  C  reset  elk  A  address  B  address 


Figure  4.7.  Register  File  Unit  Block  Diagram 
The  behavioral  VHDL  model  for  the  register  file  model  is  listed  in  Appendix  B.4. 1 .  The 
VHDL  testbench  code  for  it  is  listed  in  Appendix  B.4.2.  The  testbench  has  three  parts.  In  the 
first  part,  a  reset  is  asserted  and  the  zero  register  and  one  register  are  verified  as  well  as  that  the 
remaining  30  were  forced  to  zero.  In  the  second  part,  all  32  registers  are  given  test  values.  In  the 
third  part,  all  32  registers  are  evaluated  again  showing  that  all  but  the  two  hard  wired  registers 
accepted  the  values.  These  results  are  shown  in  Appendbe  B.4.3. 
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4.2.5  Latches  and  Multiplexors. 

The  latches  and  multiplexors  are  required  in  the  design  as  glue  logic  between  the  other 
functional  units.  To  start,  there  is  a  16-bit  latch  as  shown  in  Figure  4.8.  When  its  latch  signal  is 
asserted,  the  input  bus  is  transferred  to  the  output  and  held  at  that  value  until  the  next  time  this 
latch  is  asserted.  This  design  requires  two  latches  as  described  in  the  next  section.  The 
behavioral  model  for  the  latch  is  foimd  in  Appendix  B.5. 1  and  its  testbench  is  located  in  B.5.2. 
The  results  of  the  testbench  are  found  in  Appendix  B.5.3. 


latch 


Figure  4.8.  Latch  Unit  Block  Diagram 

Also  required  is  a  multiplexor  as  shown  in  Figure  4.9.  It  directs  one  of  four  inputs  to  a 
single  output.  The  multiplexor  is  16  bits  wide  for  all  inputs  and  outputs  and  is  controlled  by  two 
input  signals  determining  the  one  of  four  paths. 
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sel 


Figure  4.9.  Multiplexor  Unit  Block  Diagram 

The  behavioral  VHDL  model  for  the  multiplexor  is  foimd  in  Appendix  B.6.1  and  its 
testbench  is  located  in  B.6.2.  The  results  of  the  testbench  are  found  in  Appendix  B.6.3. 

4.2.6  FKP  Core. 

The  functional  units  designed  so  far  are  brought  together  to  form  the  core  of  the  Forward 
Kinematic  Processor  (FKP).  This  core  encapsulates  the  functional  units  such  that  they  appear  like 
a  single  large  functional  unit.  Two  latches  and  one  multiplexor  are  used  to  glue  the  other 
functional  units  together  so  that  data  can  travel  from  unit  to  unit  in  a  productive  manner.  Figure 
4.10  shows  the  connections  of  the  units  inside  the  core.  There  is  one  16-bit  data  input  bus  which 
is  routed  to  the  input  data  latch.  From  there,  the  data  is  passed  though  the  multiplexor  and  back 
around  to  the  register  file  for  storage.  Once  data  is  loaded  into  the  registers,  they  can  be  sent  to 
the  cosine,  sine,  addition,  subtraction,  or  multiplication  units  and  rolled  back  around  to  the 
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register  file  via  the  multiplexor  again.  When  the  desired  computations  are  complete,  the  data  in  a 
register  is  sent  to  the  output  latch  and  then  to  the  output  bus.  To  control  the  dataflow,  all  of  the 
control  signals  fi-om  each  of  the  functional  units  are  passed  as  control  signals  for  the  core  unit. 
This  model  does  not  handle  the  actual  control  of  the  core,  but  rather  gives  one  concise  shell  for 


everything  inside  it. 


Figure  4.10.  FKP  Core  Block  Diagram 
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The  structural  VHDL  model  of  the  FKP  core  is  shown  in  Appendix  B.7.1  with  the 
testbench  in  B.7.2.  The  testbench  performs  the  actions  described  above  on  some  data.  It  was 
designed  to  prove  functionality  of  the  core  since  each  subunit  has  already  been  verified.  The 
results  are  shown  in  Appendix  B.7.3. 

4.2.7  Microcode  Store. 

This  section  defines  the  instruction  set  of  the  processor.  Because  this  is  an  application 
specific  design,  the  instruction  set  contains  only  commands  for  moving  data  in  and  out,  and 
performing  one  of  the  arithmetic  or  transcendental  operations.  Table  4.1  shows  all  possible 
instructions  utili2»d  within  the  processor.  The  microcode  for  each  instruction  is  derived  fi*om  the 
testbench  of  the  FKP  core.  Since  the  FKP  core  does  not  supply  autonomous  control  over  the 
fimctional  units,  each  simulated  instruction  was  hard  coded  in  sequence.  The  microcode  store  has 
taken  each  simulated  instruction  and  formed  each  into  a  procedure  (opcode)  call  with  its 
parameters  (operands)  being  the  passed  into  the  procedure.  All  procedures  are  contained  in  a 
package  model  that  can  be  called  by  the  control  unit  of  the  next  section. 

The  behavioral  VHDL  package  model  of  the  instructions  are  shown  in  Appendix  B.8.1 
with  the  testbench  in  Appendbc  B.8.2  performing  the  same  operations  as  the  FKP  core  testbench. 
The  results  in  Appendix  B.8.3  show  that  the  replacement  of  the  autonomous  microcode  performs 
identically  to  the  hard  coded  testbench  of  the  FKP  core. 
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Table  4.1.  FKP  Instruction  Set 


Instruction 

Description 

move_in  (R,  data) 

Latch  input  bus,  pass  data  through  multiplexor  to  register  R 

moveout  (data,  R) 

Move  data  out  of  register  R,  through  output  latch  to  output  bus 

add(Rl,R2,  R3) 

Send  data  from  two  registers  (R2  and  R3)  to  two  inputs  of 
adder/subtractor  unit,  add,  send  result  back  to  register  Rl 

sub  (Rl,  R2,  R3) 

Send  data  from  two  registers  (R2  and  R3)  to  two  inputs  of 
adder/subtractor  unit,  subtract,  send  result  back  to  register  Rl 

mult  (Rl,  R2,  R3) 

Send  data  from  two  registers  (R2  and  R3)  to  two  inputs  of  multiplier  umt, 
multiply,  send  result  back  to  register  Rl 

cos  (Rl,  R2) 

Send  data  from  register  R2  to  input  of  cosine/sine  unit,  perform  cosine, 
send  result  back  to  register  Rl 

sin(Rl,R2) 

Send  data  from  register  R2  to  input  of  cosine/sine  unit,  perform  sine,  send 
result  back  to  register  Rl 

4.2.8  Control  Unit. 

The  control  unit  can  now  utilize  the  microcode  store  package  to  make  the  FKP  core 
perform  the  various  instructions  without  the  burden  of  worrying  about  dataflow  on  every  single 
clock  tick.  The  control  unit  allows  interface  with  the  outside  world  via  an  she  bit  control  port  and 
a  seven  bit  command  port  as  shown  in  Table  4.2  and  4.3  respectively.  The  control  unit  is  a  shell 
for  the  microcode  store  and  the  FKP  core  as  shown  in  Figure  4.1 1 . 


Table  4.2.  Control  Port 


Bit# 

5 

4 

3 

2 

1 

0 

Name 

Clock 

Reset 

Strobe 

Ready 

DataGetValid 

DataGetAck 

IN/OUT 

IN 

IN 

IN 

OUT 

OUT 

IN 

Table  4.3.  Command  Port 


Description 
bit  # 

CMDl 

6 

CMDO 

5 

A4 

4 

A3 

3 

A2 

2 

A1 

1 

AO 

0 

Set  Register 

0 

0 

A4 

A3 

A2 

A1 

AO 

Get  Register 

0 

1 

A4 

A3 

A2 

A1 

AO 

Run 

1 

0 

X 

X 

X 

X 

X 
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The  clock  input  is  the  overall  system  clock  for  the  processor.  The  reset  is  the  overall 
system  reset  for  the  processor.  The  remaining  bits  of  the  control  port  are  utilized  in  conjunction 
with  the  command  port.  After  system  reset,  the  ready  output  signal  is  asserted,  indicating  that  the 
processor  is  available  to  perform  one  of  the  three  fimctions:  set  register,  get  register,  or  run.  The 
user  sets  the  CMDO  and  CMDl  bits  to  correspond  to  the  desired  function  and  asserts  the  strobe 
input  signal.  The  processor  will  deassert  the  ready  signal,  evaluate  the  command  port  and  take 
the  appropriate  action.  When  the  function  is  complete,  the  ready  signal  is  reasserted. 

If  the  function  is  a  set  register,  then  the  16-bit  input  data  bus  is  latched  in  and  routed  to 
the  register  designated  by  bits  A4-A0  of  the  command  port.  If  the  function  is  a  get  function,  then 
the  register  designated  by  bits  A4-A0  are  sent  through  the  output  latch  and  to  the  16-bit  data 
output  bus.  Finally,  if  the  fimction  is  run,  then  the  A4-A0  bits  are  ignored  and  the  predetermined 
sequence  of  instructions  is  executed. 

The  sequence  is  arranged  to  take  advantage  of  any  common  terms  found  in  the  12 
equations  of  Chapter  2.  Chapter  3  evaluated  the  equations  and  determined  that  there  would  be 
seven  additions,  three  subtractions,  10  multiplication’s,  four  cosines,  and  four  sines.  This  would 
require  a  total  of  28  instructions.  However,  this  did  not  count  for  the  data  moves  into  and  out  of 
the  processor  using  the  set  and  get  fimctions.  Table  4.4a  shows  the  operations  involved  with 
moving  in  the  angles  and  possibly  some  constants  into  the  registers.  The  register  locations  that 
hold  this  constant  data  is  fixed  due  to  the  feet  that  the  run  fimction  will  expect  the  correct  data  in 
these  locations.  The  first  time  theses  data  values  are  loaded,  both  constants  (a’s)  and  angles  (b’s) 
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are  required.  But  from  then  on,  only  the  new  set  of  angles  are  needed  because  the  constants  do 
not  change  and  are  not  written  over  unless  due  to  power  loss  or  system  reset. 


Table  4.4a.  Operations  Involved  with  the  Set  Function 


Step# 

Register  # 

Instruction  and  Description 

la 

2 

move  in  (2,  aO)  =  move  link  length  0  into  register  2 

2a 

3 

move  jn  (3,  al)  =  move  link  length  1  into  register  3 

3a 

4 

move  in  (4,  a2)  =  move  link  length  2  into  register  4 

4a 

5 

move  in  (5,  a3)=  move  link  length  3  into  register  5 

5a 

6 

move  jn  (6,  dl)=  move  link  offset  1  into  register  6 

lb 

7 

move  in  (7,  01)=  move  theta  1  into  register  7 

2b 

8 

movejn  (8,  02)  =  move  theta  2  into  register  8 

3b 

9 

move  in  (9,  03)  =  move  theta  3  into  register  9 

4b 

10 

move  in  (10,  04)  =  move  theta  4  into  register  10 

With  the  constants  and  angles  loaded,  the  run  function  can  be  initiated.  Table  4.4b  shows 
the  internal  steps  involved  with  calculating  the  results  of  the  twelve  equations.  There  is  one  extra 
add  of  step  18  due  to  the  internal  move  of  the  zero  in  the  zero  register  to  register  28. 


Figure  4.4b.  Internal  Operations  During  Run  Function 


Step# 

Register  # 

Instruction  and  Description 

2 

11 

cos(ll,  7;  =  cos(01) 

3 

12 

sin(12,  7)  =  sin(02) 
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13 

cos(13,  8)  =  cos(02) 

14 

add(14,8,  9)  =  02+03 

add(14,  14, 10)  =  02+03+04 

15 

sm(15,  14)  =  sin(02+03) 

16 

cos(16,  14)  =  cos(02+03) 

17 

mult(17,  4,  ii;  =  a2  cos(02) 

add(17,  17,  18)  =  a2  cos(02)  +  a3  cos(02+03) 

add(17,  17,  3)  =al+sa  cos(02)  +  a3  cos(02+03) 

18 

mult(18,  5, 16)  =  a3  cos(02+03) 

mult(18,  17,  11)  =  cos(01)(  al  +  a2  cos(02)  +  a3  cos(02+03)) 

19 

mult (19,  4,  12)  =  a2  sin(02) 

20 

mult(20,  11,  25)  =  cos(01)cos(02+03+04) 

21 

mult(21,  26,  25)  =  sin(01)cos(02+03+04) 

22 

sin(22,  14)  =  sm(02+03+04) 

23 

mult(23,  11,  22)  =  cos(01)sin(02+03+04) 

sub(23,  0,  23)  =  -(  cos(01)sm(02+03+04)) 

24 

mult(24,  26,  22)  =  sin(01)sm(02+03+04) 

sub(24,  0,  24)  =  -(  sin(01)sm(02+03+04)) 

25 

cos(25,  14)  =  cos(02+03+04) 

26 

sin(26,  7)  =  sin(01) 
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27 

sub(27,  0,  11)  =  -cos(ei) 

18 

28 

add(28,  0,0)  =  0 

24 

29 

add(29,  18,  2)=a.0  +  cos(01)(  al  +  a2  cos(e2)  +  a3  cos(e2+03)) 

25 

30 

mult(30, 17,  26)  =  sin(01)(  al  +  a2  cos(02)  +  a3  cos(02+03)) 

27 

31 

mult(31,  5, 15)  =  a3  sin(02+03) 

28 

add(31,  31, 19)  =  a2  sin(02)  +  a3  sin(02+03) 

29 

add(31,  31,  6)=d3.  sin(02)  +  a3  sin(02+03)  +  dl 

The  get  functions  can  now  be  used  to  retrieve  the  last  12  registers  for  the  results  of  the  12 
equations.  Each  value  is  moved  out  one  at  a  time  and  in  any  order  the  user  desires. 

The  structural  VHDL  model  of  the  Forward  Kinematic  Processor  is  shown  in  Appendix 

B.9.1. 

43  Conclusions 

This  chapter  developed  the  models  of  each  of  the  required  functional  units.  Each  model 
was  tested  as  a  stand-alone  design  before  integration  into  the  Forward  Kinematic  Processor. 

Once  the  initial  five  constants  are  loaded  in,  the  processor  takes  four  instructions  to  load  the 
angles,  29  instructions  to  calculate  the  results,  and  12  instructions  to  get  them  out,  for  a  total  of 
45  instructions.  The  processor  was  then  tested  fi-om  the  top  most  level  of  the  design  model.  With 
the  simulation  of  the  processor  complete,  the  next  step  in  the  irrqilementation  is  synthesis  to  an 
FPGA.  This  is  described  in  Chapter  5. 
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5.  VHDL  To  FPGA  Synthesis 


5.1  Introduction 

The  goal  of  this  chapter  is  to  move  the  FKP  design  modeled  in  the  hardware  description 
language  straight  to  an  FPGA  implementation.  The  models  were  behavioral  descriptions  of  the 
fimctional  units  with  a  top  level  structural  description  of  the  entire  processor.  At  this  level  of 
abstraction,  there  is  no  implied  physical  architecture.  We  have  not  even  worked  with  a  gate  level 
representation  of  the  design.  The  synthesis  into  an  FPGA  induces  an  explicit  physical  architecture 
based  on  the  target  device;  in  this  case  the  Xilinx  4020E. 

5.2  VHDL  Source  Restrictions 

VHDL  was  originally  designed  as  a  simulation  and  modeling  language.  The  concept  of 
synthesis  directly  from  the  model  was  not  included  in  the  design  of  the  language.  Therefore,  some 
of  the  constructs  found  in  VHDL  are  not  synthesizable.  The  most  obvious  limitation  is  the  use  of 
specific  time  delays.  For  example,  the  statement  “wait  for  10ns”  or  “A  <=  B  after  5ns”  has  no 
meaning  to  a  synthesis  tool  because  there  is  no  on-chip  clock  to  direct  when  the  action  is  to  take 
place.  Also,  constructs  such  as  access  types,  records,  recursive  subprograms,  and 
multidimensional  arrays  are  non-synthesi2able  (Raines;  Ailes:21). 

Most  of  these  restrictions  were  known  when  beginning  the  development  of  the  models 
from  Chapter  4,  but  some  imexpected  and  potentially  detrimental  constraints  appeared  as  the 
design  moved  on.  First  was  the  use  of  more  than  one  signal  inside  of  process  sensitivity  list. 
Typically,  many  signals  can  be  listed  in  the  sensitivity  list  of  the  process,  indicating  execution  of 
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the  process  if  any  of  the  listed  signals  changes  state.  The  synthesis  tools  could  only  handle  one 
signal  in  the  list.  A  process  that  is  dependent  on  both  the  clock  and  the  reset  signal  would  cause 
errors  during  synthesis.  To  work  around  this  problem,  most  all  sensitivity  lists  became  empty 
forcing  continuous  execution,  with  the  clock  events  being  listed  as  a  separate  wait  statement 
within  the  process  body.  The  second  problem  pertains  to  the  need  to  assert  a  signal  for  one  clock 
period  and  then  deassert  it  on  the  next  clock  period.  Such  an  event  infers  a  clock  wait  between 
the  two  transitions,  but  only  one  wait  statement  is  allowed  on  each  pass  through  the  process  body. 
The  result  is  a  streamlined  hardware  description  such  as  “A<=B;  wait  imtil  clock  tick; 

A<=not(B);  wait  until  clock  tick”  being  unrolled  to  an  explicit  state  machine  where  the  execution 

\ 

through  the  process  body  takes  a  different  path  for  each  state.  Each  state  then  contains  a  unique 
command  for  “A<=B”  or  “A<=not(B)”  and  there  is  only  one  wait  statement  for  all  paths. 

5.3  Design  Flow 

There  are  four  major  tools  used  to  perform  the  synthesis  step.  The  Synopsys  VHDL 
analyzer  is  used  to  compile  the  VHDL  code.  This  includes  compilation  of  the  testbenches  for 
each  fimctional  unit.  The  ftmctional  units  are  then  simulated  with  the  Synopsys  VHDL  simulator. 
These  two  tools  together,  both  executing  on  a  UNIX  platform,  form  the  primary  development 
tools  of  the  models  (Synopsys).  Because  both  the  Analyzer  and  Simulator  do  not  aim  towards 
synthesis,  the  restrictions  from  section  5.2  are  ignored  and  pushed  aside  for  later  tools.  The  other 
three  major  tools  are  Synopsys  Design  Analyzer  and  Exemplar  Leonardo  for  synthesis,  and  Xilinx 
XACTstep  for  mapping. 
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5.3.1  SvnoDsvs  Design  Analyzer 

The  Synopsys  Design  Analyzer  started  out  as  the  primary  UNIX  synthesis  tool.  Within 
the  Design  Analyzer  is  a  feature  called  the  FPGA  Compiler.  It  accepts  VHDL  as  input  and 
attempts  to  produce  a  hybrid  Synopsys/Xilinx  netlist.  The  drawback  to  using  this  tool  is  its 
turnaround  time.  Typically,  a  small  model  such  as  the  cosine/sine  unit  will  take  upwards  of  two 


hours  to  generate  the  netlist  (Synopsys). 

5.3.2  Exemplar  Leonardo 

The  PC/Windows  95  based  Exemplar  Leonardo  application  turned  out  to  be  quicker  than 
Synopsys  and  much  easier  to  learn  and  use.  The  following  sequence  describes  the  path  used  to 
generate  a  correctly  targeted  netlist  (Exemplar).  First,  the  program  is  loaded  and  the  startup 
screen  is  shown  in  Figure  5.1 . 


I  0le  10  Qptniiize  R^ort  Hierarctiy  Tools  Oetions  _ _ _  tipIP 


Flow  Guide..  Toolbar..  Schematic  Viewer..  Design  Browser...  Out... 


X  mpl  r 


I  Leonardo  -  V4 . 0 . 3 

I  Copyright  1990-1996  Exemplar  Logic.  Inc.  All  rights  reserved. 


I  ^felcome  to  Interactive  Leonardo  Version  V4.0.3  **» 

I 

1  Wevs  : 

I  »  Enter  "help"  to  get  an  overview  of  all  commands 
I  *  Enter  < command >  -help  to  get  usage  of  each  command 

I  Bess ion  hist  or y  wi 1 1  be  1 ogged  to  file  ' exemp 1 ar . h i s ’ 

I  LE0NARD0{1} : 


'CmKel 


Figure  5.1.  Exemplar  Logic  Leonardo  Startup  Screen 
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The  first  action  taken  is  to  click  on  the  Flow  Guide  button.  The  Flow  Guide  shown  in 
Figure  5.2  appears.  Because  we  wish  to  customize  certain  aspects  of  the  design,  the  Customize 
Flow  Guide  button  is  clicked.  Another  window  appears  that  allows  us  to  inform  the  tool  that  the 
design  consists  of  multiple  VHDL  files  because  many  of  the  fimctional  units  depend  on  a  package 
or  header  file.  We  also  select  the  option  of  packing  the  configurable  logic  blocks  (CLB)  of  a 


Xilinx  FPGA,  decomposition  of  Look  Up  Tables  (LUT),  and  reporting  of  area  used  as  shown  in 
Figure  5.3.  The  result  is  a  variation  of  Figure  5.2  with  the  extra  steps  added  into  the  design  Flow 
Guide  of  Figure  5.4. 


-I 

2 

3 

4 

5 

6 

7 

f  Load 
;  Library 

^  Read 

-►  Pre¬ 
optimize 

Optimize 

-4  Report 
Area 

-f  Report 
Delay 

^  Wtite 

Leonardo  Flow  Guide 

Wdcome  to  Leonardo  Flow  Guide.  Your  commands  and  their  output  will  be  shown  on  the  main  command 

window.  You  may  exit  Flow  Guide  at  anytime  by  pressing 'Exit  Flow  Guide'. 

Click  on  the  first  button  to  start. 


1 

Custorrize  Flow  Guide 

^  Flow  Guide 

Figure  5.2.  Leonardo  Flow  Guide 

The  first  button.  Load  Library,  is  selected  and  we  choose  the  4000E  family  as  shown  in 
Figure  5.5.  The  second  button  is  used  repetitively  to  read  in  and  analyze  the  VHDL  files.  A 
window  appears  that  allows  the  filename  to  be  input  as  shown  in  Figure  5.6.  As  each  file  is  being 
read  in,  any  warning  messages  are  displayed  regarding  synthesis  problems. 


Check  all  boxes  that  apply  to  your  design: 


ii^ulFlow: 

pr  Multiple  VHDL  or  Verlog  Input  Files 

J  Altera  EDIF  input  ffle 

J  Design  with  instantiated  modgen  cell 

1  Optiinize  Flow: 

pr  Techncriogy  specific  module  generation 

pr  Extract  coiffders,  decoders  and  rams 

II 

J  Specif  constrsMs  for  optimizationAiming  optimization 

ll 

1; 

11 

J  Timing  Optimization 

V  Pack  CLBs  (Xilinx) 

Reporting  How: 

W  Report  Area 

Report  Delay 

Outpi^  Flow: 

pr  Decompose  LUTs  (FLEX,  ORCA,  XllHix  3k/4ki5K) 

J  Loadb^ancingforActeljQuickLogic  and  ASICs 

J  Generate  timespec  for  Xilinx 

J  Altera  EDIF  output  file 

^  Flow  Guide  I  Cancel 


Figure  5.3.  Customize  Flow  Guide 

Once  all  the  VHDL  files  are  loaded  in,  the  design  is  elaborated  based  on  the  top  level 
entity  description.  Figure  5.7  shows  the  Elaborate  window.  Clicking  the  elaborate  button 
automatically  determines  what  the  top  level  is  and  considers  its  port  declaration  as  the  I/O  of  the 
design.  Next,  the  Pre-Optimize  step  is  accomplished,  shown  in  Figure  5.8,  followed  by  the 
selection  of  the  Modgen  Library  in  Figure  5.9,  and  the  resolution  of  the  Modgens  shown  in  Figure 


5.10. 
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Hkidgen  i  Modgen 


7  8 

Opitimize  Pack 

:  dJBB 


9  10 
■4  Report  -►  Decomp 
Area  LUTs 


Leonardo  How  Guide 

welcome  to  Leonardo  How  Guide.  Your  commands  and  liieir  output  wil  be  shown  on  the  main  command 

window.  You  may  exit  How  Guide  at  anytime  by  pressing 'Exit  How  Guide'. 

Cick  on  the  first  button  to  start. 


Customize  Fiow  Guide 


Exit  Flow  Guide 


Figure  5.4.  Customized  Flow  Guide 
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i  HootEntity;  | 

I  Architectiire:  P 


WorkLibray;  [work 
Parameters:  | 
ii  Generics:  | 


Elaborate 


Cmicel 


Figure  5.7.  Elaborate 


I _ ■  ■  ■  ^  . 

j  Design: 

.work.reg  6_e.behavlor 

1  Switches: 

W  Sh^e  common  logic 

if  Remove  unused  Idang&ig)  logic 

if  Extract  counters,  decoders  and  rams 

J  Perform  operation  on  only  a  single  level  of  Werarciqr 

Ii  gre-OptRirize 

1  .  . . ^ 

Cancel 

Figure  5.8.  Pre  Optimize 


Lucent  ORCA-2A 
Lucent  ORCA-2C 
XiDruOK 
Xianx3KXBLOX 


XilRix4KXBLOX 

XHinxSK 

XffinxTK 

General  ASIC  Tectinologies 
General  FPGA  Technologies 


Load 


Cancel 


Jj  Preserve  hierarctv 
J  Defautt  Resolving 

J  Perform  resolving  only  at  the  top  level  of  hierarchy 


Resolve 


Advanced... 


Cancel 


Figure  5.9.  Load  Modgen  Library  Figure  5.10.  Resolve  Modgen 

The  heart  of  this  design  flow  is  the  Optimize  step,  where  we  can  choose  what  type  of 
optimization  to  do.  The  exhaustive  selection  will  require  multiple  hoxirs  to  complete.  On  the 
other  hand,  a  quick  optimization  may  only  require  five  to  10  minutes.  Because  we  are  primarily 
concerned  with  area  and  not  with  speed,  the  area  optimization  box  is  checked  as  shown  in  Figure 
5.11.  The  results  of  the  optimization  are  shown  in  Figure  5.12,  but  the  numbers  are  not  entirely 
accurate.  The  critical  path  is  listed  as  29ns.  However,  the  design  has  not  yet  been  placed  and 
routed  on  the  chip.  We  will  see  later  in  Chapter  6  that  the  critical  path  is  closer  to  100ns. 


Figure5.il.  Optimize 


Figure  5.12.  Results  of  Optimization 


The  optimized  design  is  then  packed  into  the  CLBs  by  using  the  window  shown  in  Figure 


5.13,  followed  by  decomposing  the  LUTs  within  the  CLBs  shown  in  Figure  5.14. 


Figure  5.13.  Pack  CLBs 


Figure  5.14.  Decompose  LUTs 
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The  final  step  is  the  writing  of  the  Xilinx  Netlist  Format  (XNF)  file  to  disk  as  shown  in 


Figure  5.15. 


Figure  5.15.  Write  XNF 


5.3.3  Xilinx  XACTstep  Ml 

The  Xilinx  XACTstep  program  picks  up  where  the  Exemplar  tools  stop.  It  inputs  the 
XNF  file  and  sets  up  a  project  manager  screen  that  keeps  track  of  the  version  and  revision  of  the 
design  as  shown  in  Figure  5.16.  Once  loaded  in  as  a  project  the  design  is  implemented  as  shown 


Figure  5.16.  XACTstep  Design  Manager 


I 

I 
j 

in  Figure  5.17.  The  target  device  is  chosen,  along  with  the  current  version  and  revision  number. 
Additional  options  shown  in  Figure  5.18  allow  a  constraint  file  to  be  added  to  the  design.  In  this 
case,  a  UCF  file  is  used  to  lock  certain  I/O  names  to  actual  pins  on  the  FPGA.  Also,  the 
configuration  template  can  be  edited  fi-om  this  screen.  Figure  5.19  shows  the  configuration 
options  screen.  Both  the  inputs  and  the  outputs  are  set  to  CMOS  thresholds  and  the  DONE,  MO, 
Ml,  and  M2  mode  pins  are  set  to  have  an  internal  pull-up  resistor. 


Figure  5.17.  Implementation  Window 


Figure  5.18.  Implementation  Options 


The  Flow  engine  is  now  invoked  and  the  process  of  translating,  mapping,  placing  and 


routing,  and  configuring  is  performed.  Figure  5.20  shows  the  Flow  Engine  and  the  results  of  a 
synthesized  design.  The  result  is  a  BIT  file  that  is  ready  for  download  into  the  FPGA. 
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Figure  5.19.  Configuration  Options 


Figure  5.20.  Flow  Engine 
5.4  Bitstream  file  to  FPGA 


The  BIT  file  is  downloaded  to  the  FPGA  using  the  Hardware  Debugger  utility  of  the 
XACTstep  program.  An  X-Checker  cable  is  used  between  the  FPGA  and  the  host  computer’s 
serial  port.  The  Hardware  Debugger  then  sends  the  proper  headers,  fi'ames  of  data,  and  trailers 


down  the  X-Checker  cable  and  into  the  FPGA. 
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5.5  Conclusions 


This  chapter  discussed  the  procedures  for  synthesizing  VHDL  models  to  FPGA 
implementations.  The  process  works,  however  the  FBCP  processor  cannot  fit  entirely  on  the  target 
4020E  FPGA.  If  the  target  FPGA  was  much  larger  in  capacity  than  the  4020E,  then  in  theory,  the 
entire  design  could  be  placed  into  one  device.  Instead,  half  of  the  register  file  umt  is  pushed 
through  Exemplar  Leonardo  and  XUinx  XACTstep  and  programmed  into  the  4020E  that  is 
available  in  the  laboratory.  Figure  (5.21)  shows  the  CLB  and  routing  layout  for  the  register  file  in 
the  4020E.  This  design  used  40%  of  the  total  available  CLBs,  27%  of  the  total  available  lOBs, 
and  12%  of  the  total  CLKIOBs  of  the  4020E.  A  text  log  of  the  XACTstep  process  jfrom  XNF 


format  to  BIT  format  is  listed  in  Appendix  C. 


Figure  5.21.  4020E  CLB  and  Routing  for  the  Half  Register  File  Unit 


FPGA  Processor  Implementation  for  the  Forward  Kinematics  of  the  UMDH 


53 


6.  FPGA  Verification 


6.1  Introduction 

This  chapter  investigates  the  physical  implementation  of  one  of  the  fimctional  unit  models 
into  a  Xilinx  4020E  FPGA.  The  Logic  Master  XLIOO  by  Integrated  Measurements  Systems 
(Integrated)  will  serve  as  the  testbed  for  the  programmed  device.  Because  the  4020E  package  is 
not  directly  compatible  with  the  IMS,  a  custom  adapter  is  developed. 

6.2  IMS  Logic  Master  XLIOO  tester 

The  IMS  Logic  Master  XLIOO,  shown  in  Figure  6.1,  can  support  up  to  lOOMHz  data  and 
clock  rates  with  up  to  224  I/O  channels.  To  test  the  4020E  FPGA,  one  XL  PGA  Auto  Socket 
Card  is  used  to  form  the  interface  to  the  IMS. 


Figure  6.1.  The  IMS  Logic  Master  XLIOO 


FPGA  Processor  Implementation  for  the  Forward  Kinematics  of  the  UMDH 


54 


6.3  HO208  Chip  Carrier  and  Daughter  Board 


The  Xilinx  4020E  FPGA  is  contained  in  a  Heat-sinked  Quad  Flat  Pack  (HQFP)  208  pin 
package  (Xi]inx:10-35).  Because  the  device  does  not  have  pins  that  can  be  easily  inserted  into  a 
test  circuit  board,  an  adapter  from  Ironwood  Electronics  (see  Appendix  D)  is  used  to  moimt  the 
FPGA  to  the  test  board.  The  adapter  is  wire- wrapped  to  a  set  of  connectors  which  match  up  with 
connectors  installed  on  the  IMS  socket  card.  Figure  6.2  shows  the  completed  test  unit.  Also 
shown  in  Figure  6.2  is  the  Xilinx  X-Checker  cable  for  downloading  the  serial  bit  stream  from  the 
host  PC  to  the  FPGA. 


Figure  6.2.  Completed  Test  Unit 

There  are  16  groxmd  connections  and  seven  +5  Volt  connections  to  the  adapter.  The 
power  supply  is  external  to  the  IMS  to  allow  the  FPGA  to  be  programmed  and  hold  its 
configuration  when  the  IMS  is  not  cycling  a  test.  When  the  IMS  finishes  a  test  and  sits  idle,  it 
removes  all  power  to  the  device  under  test.  This  would  erase  the  configuration  every  time  the 
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IMS  stopped  a  test  cycle  because  the  configuration  is  stored  in  internal  latches  (Xilinx  13-39).  By 
keeping  power  supplied  to  the  FPGA,  even  while  idle,  the  configuration  is  retained.  One  possible 
solution  to  the  loss  of  configuration  is  to  program  a  PROM  device  instead  of  the  FPGA  directly. 
The  PROM  can  then  hold  the  configuration  information  even  when  the  power  is  removed,  and 
transfer  the  data  into  the  FPGA  every  time  the  system  powers  up. 

Also  connected  to  the  adapter  are  control  pins  for  the  FPGA.  The  TCK  pin  is  pulled  up  to 
Vcc  to  prevent  the  device  from  entering  into  a  boundary  scan  EXTEST  during  the  download 
process(Xilinx:  13-30).  The  MO,  Ml,  and  M2  pins  are  also  pulled  up  to  Vcc  to  force  the  device 
into  Serial  Slave  mode.  This  mode  is  the  simplest  to  implement.  The  Init,  Done,  Rst,  and  Prog 
pins  are  all  pulled  up  to  Vcc.  Combined  those  with  the  Din  and  Cclk  from  the  X-Checker  and  we 
have  the  setup  shown  in  Figure  (6.3)  (Xilinx:5-18). 

The  remaining  connections  represent  either  input  or  output  of  the  FPGA.  The  Ironwood 
Electronics  data  sheet  in  Appendix  C  shows  the  4020E  pin  name  and  number  associated  with  the 
adapter  pin  numbers  and  corresponding  IMS  connections. 

There  is  a  switch  wired  to  the  Prog  pin  to  allow  a  ft)rced  reset  of  the  FPGA.  This  causes 
the  configuration  to  be  erased  and  the  device  will  prepare  for  a  new  download.  The  small  green 
LED  indicates  power  to  the  FPGA  from  the  external  supply.  The  red  LED  indicates  that  the  IMS 
has  output  5  Volts  on  the  J13  channel. 
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Figure  6.3.  Slave  Serial  Download 


6.4  Functional  Unit  Testing 

The  first  design  that  was  successfully  tested  was  a  combination  AND/OR  gate  utilizing 
four  I/O  pins  and  one  CLB  out  of  a  total  of 784.  The  AND/OR  gate  was  modeled  in  VHDL  and 
pushed  all  the  way  through  to  implementation.  Fastest  speed  rating  on  the  gates  was  1 1  ns,  or 
90.9MHz. 

The  second  design  was  the  half  register  file  unit  fi'om  Chapter  5.  The  only  difference  in 
the  process  the  second  time  was  the  addition  of  a  UCF  constraint  file  to  force  the  I/O  pins  to 
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predetermined  locations.  Even  if  the  model  changes  and  causes  a  resynthesis  of  the  design,  the 
surrounding  environment  of  the  FPGA  does  not  have  to  change. 

The  IMS  tester  allowed  for  a  functionality  and  speed  test  of  the  FPGA.  For  the  functional 
test,  the  register  file  is  reset  and  all  16  registers  are  output  to  the  A  and  B  bus  in  opposite  orders. 
Figure  6.4  shows  the  waveforms  and  indicates  that  all  registers  except  number  1  is  cleared  to  a 
zero.  If  we  recall  fi-om  Chapter  4,  the  number  1  register  always  holds  a  numeric  1.0,  and  the 
number  0  register  always  holds  a  numeric  0.0. 

After  the  registers  are  cleared,  all  16  registers  are  written  to  with  a  different  bit.  Once 
again  the  two  output  buses  A  and  B  are  given  the  values  of  each  register  in  opposite  order.  The 
waveform  shows  that  both  the  A  and  B  bus  can  retrieve  the  stored  information  fi’om  all  registers, 
with  the  exception  of  registers  0  and  1 . 

The  speed  test  is  performed  by  decreasing  the  IMS  clock  period  until  the  above 
functionality  test  fails.  At  48.5  ns,  the  test  fails.  Because  the  cycle  of  the  register  file  is  two 
cycles  of  the  IMS,  the  actual  failure  time  is  a  97  ns  clock  period,  or  10.3MHz. 
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6.5  Conclusions 

This  chapter  showed  the  physical  inqjlementation  and  electrical  verification  of  only  the  half 
sized  register  file  that  was  synthesized  in  Chapter  5.  A  Xilinx  4020E  FPGA  was  configured  fi-om 
the  host  PC  using  a  custom  adapter  board  and  electrically  tested  by  using  the  IMS  test  station. 

The  entire  FKP  model  could  not  be  implemented  because  the  size  of  the  design.  It  would  require 
multiple  4020E  FPGAs  or  possibly  one  FPGA  fi-om  a  higher  density  device,  both  of  which  were 
not  available  at  the  time  of  implementation.  However,  the  success  of  the  half  sized  register  file 
indieates  that  the  entire  FKP  model  could  have  also  been  implemented  suceessfiiUy,  assuming  the 
model  is  correct  and  a  multi-device  partitioner  program  is  available. 
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7.  Conclusions  and  Recommendations  for  Future  Work 


7.1  Conclusions 

The  objective  of  this  research  was  to  implement  the  forward  kinematic  algorithm  for  the 
Utah  MIT  Dexterous  Hand  (UMDH)  by  creating  VHDL  models  and  directly  synthesizing  them 
into  an  FPGA.  The  forward  kinematics  of  the  UMDH  were  developed  and  analyzed  and  the 
resulting  algorithm  shows  that  12  separate  equations  each  containing  multiple  mathematical 
operations  are  needed.  If  common  expressions  are  shared  between  equations,  a  total  of  28 
operations  are  required.  These  shared  terms  are  stored  in  the  register  file  umt  and  are  sent  to 
either  a  cosine/sine  unit,  an  adder/subtractor  unit,  or  a  multiplier  unit  as  the  algorithm  proceeds. 
The  input  (angles)  and  output  (transformation  matrix)  are  transferred  through  dedicated  I/O 
buses.  The  design  results  in  a  semi-autonomous  Forward  Kinematic  Processor  (FKP)  that  can 
calculate  the  forward  kinematics  every  time  the  surroxmding  system  issues  a  run  command.  The 
surrounding  system  does  not  deal  with  the  intricacies  of  the  algorithm  and  can  tackle  other  system 
tasks  while  the  FKP  is  busy. 

It  was  planned  that  the  entire  algorithm  would  fit  into  a  single  FPGA.  However,  without 
the  availability  of  high  density  FPGAs  in  the  laboratory,  only  a  small  portion  of  the  design  was 
able  to  become  realized  in  hardware.  The  register  file  unit  was  chosen  as  the  sub-model  to 
implement  because  it  contains  combinational  logic  similar  to  all  the  other  units  plus  memory 
storage.  After  a  few  iterations  with  the  floorplanning  tools,  the  register  file  itself  proved  to  be 
larger  than  one  4020E  FPGA.  The  register  file  unit  was  reduced  to  half  its  size  and  resynthesized. 
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The  new  design  successfully  fit  using  40%  of  the  configurable  logic  blocks  of  the  4020E.  The 
design  was  programmed  into  a  4020E  FPGA  and  tested  using  an  IMS  Logic  Master  XL. 

Electrical  verification  shows  an  upper  bound  on  the  clock  fi-equency  to  be  10.3  MHz,  above 
which  the  registers  begin  to  hold  incorrect  data. 

7.2  Lessons  Learned 

It  can  be  concluded  that  small  designs  can  accurately  map  into  the  FPGA  and  with  short 
tum-around  times.  The  Xilinx  4020E  does  not  have  the  capacity  that  was  initially  expected  and 
proved  to  be  too  small  for  the  entire  FKP  design.  The  FKP  core  model  and  everything 
underneath  is  completely  synthesizable.  This  required  some  restrictions  on  the  coding  style  to 
avoid  multiple  signals  in  sensitivity  lists,  multiple  wait  statements  in  a  process,  and  any  reference 
to  a  specific  delay  of  time. 

7.3  Recommendations 

The  first  issue  to  be  addressed  is  the  optimization  of  the  VHDL  code  for  synthesis.  Some 
VHDL  rntTipilpTs  support  the  use  of  in-line  macro  declarations  for  instantiation  of  complete 
structures  such  as  fest  adders  already  designed  into  the  device.  The  use  of  such  structures  can  not 
onfy  speed  up  the  design,  but  also  take  up  less  FPGA  area.  Secondly,  this  research  focused  solely 
on  Xilinx  devices.  Using  other  vendors  products  such  as  Altera’s  MAX  Plus  11  software  and  their 
FlexlOK  series  of  FPGAs  may  produce  better  or  worse  results.  Third,  portions  of  the  FKP  itself 
could  be  redesigned.  The  multiplier  unit  uses  a  32-bit  adder  as  on  of  its  components.  The 
adder/subtractor  unit  is  16  bits  by  itself  The  two  units  could  be  merged  into  an  ALU,  thus 
eliminating  the  16-bit  adder  and  allowing  aO  additions  and  subtractions  to  pass  through  the  32-bit 
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component  of  the  ALU.  The  increased  overhead  to  choose  either  multiplication  or 
addition/subtraction  should  be  minimal  compared  to  the  area  saved  by  removing  the  16-bit 
adder/subtractor  unit.  Fourth,  investigation  into  partitioning  tools  for  Xilinx  devices  may  allow 
the  design  to  be  spread  across  multiple  FPGAs.  Last,  the  microstore  and  contoUer  units  are  not 
entirly  synthesizable.  Both  need  to  be  modified  to  adhere  to  the  synthesis  restrictions. 

7,4  Ideas  for  Future  Work 

The  architecture  of  the  design  could  be  modified  to  resemble  more  of  a  macropipeline 
structure.  The  core  could  be  divided  into  three  parts.  The  first  part  would  calculate  the  angles 
needed.  The  second  part  would  calculate  the  sines  and  cosines.  The  third  part  would  perform  the 
multiplications,  additions  and  subtractions.  The  result  would  be  a  higher  throughput  system  but 
with  a  two  stage  delay  to  get  the  answers.  On  the  other  hand,  the  two  data  buses,  one  input  and 
one  output,  could  be  merged  into  a  single  I/O  bus. 

The  design  was  based  on  the  idea  of  the  functional  units  each  being  a  separate  state 
mflchine  and  synchronously  handshaking  with  the  control  umt.  This  allowed  all  timing 
propagation  delays  within  the  CLBs,  lOBs  and  routing  to  be  ignored.The  result  is  a  design  that 
may  waste  time  during  a  stage  that  is  simple  because  the  stage  that  requires  the  longest  time 
restricts  the  rest  of  the  design  fi-om  going  any  fester.  A  possible  better  approach  would  be  a  more 
combuiational,  less  state  machine  design.  This  would  require  knowledge  of  the  delays  of  the 
circuit  as  it  is  placed  into  the  FPGA. 

Different  algorithms  such  as  the  inverse  kinematics  of  the  UMDH  or  a  gross/fine  motion 
controller  could  be  investigated  using  the  same  concepts  and  procedures  developed  here. 
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The  investigation  into  PROM  development  for  truly  portable  systems  should  be  addressed. 
The  PROM  device  can  serially  download  the  configuration  of  the  FPGA  every  time  the  system 


powers  up.  This  property  of  the  FPGA  also  allows  dynamic  reconfiguration  of  parts  of  the 
design,  allowing  the  controller  of  the  FKP  to  swap  in  and  out  fimctional  units  as  needed. 
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Appendix  A:  Code  for  behavioral  Algorithm 


A.1  C  code 

‘‘umdh.h”  c  code  header  file 

r  •/ 

/*  umdh.h  */ 

/*  *1 

r  Steve  Parmley  */ 

/. - — —  */ 

/* 

/*  Defines  kinematic  parameters  of  umdh  thumb  manipulators.  */ 

.  *J 

#clerineUMDH_AO  (-0.75) 
fWefineUMDH_A1  (0.375) 

#defineUMDH_A2  (1.7) 

#defineUMDH_A3  (1.3) 

#defineUMDH  D1  (3.125) 

#d^neUMDH_D2  (0.0) 

MeHneUMDH  D3  (0.0) 

#defineUMDH_D4  (0.0) 
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‘^range.c’ 


r 
r 
r 
r 
r 
r- 

I* 

/* 

r 

f*  Compile  with  gcc  range.c -Im 


range.c 

Steve  Parmley  -  UMHD  forward  kinematic  function 


Compute  fonward  kinematics  given  current  joint  positions 
and  writes  all  temp  values  to  disk 


c  code 

*/ 

*i 

*1 

*/ 

*/ 

V 

*/ 

*/ 

*/ 

*/ 

*/ 


* ******  tk  »*•»»**  *  **,*** **■» 

f*  include  files 

#inc!ude  <math.h> 
#include  "umdh.h" 
#include  <stdio.h> 
#include  <stdlib.h> 


V 

V 

V 


*/ 
V 
*/ 

void  umdhFwdKin(float  *]tang,  float  *noap,  FILE  *rangeptr) 

{ 

float  a0,a1  ,a2,a3,  d1  ,d2,d3,d4; 

float  c1 ,  c2,  c3,  c4; 

float  s1,s2,  s3,  s4; 
float  C23.s23,c234.s234; 


^  0*»A*****il************************************************** 

/*  umdhFwdKin  Compute  forward  kinematics. 


aO  =  UMDH_AO; 
a1=UMDH_A1; 
a2  =  UMDH j^2; 
a3  =  UMDHJV3; 
d1  =  UMDH^DI; 
d2  =  UMDH_D2; 
d3  =  UMDH_D3; 
d4s=UMDH_D4; 


s1  =  sinOtang[0]);  c1  =  cos(jtang[0]); 
s2  =  sin(jtang[1  ]);  c2  =  cos(jtang[1  ]); 
s3  =  sin(jtang[2]);  c3  =  cos(jtang[2]); 
s4  =  sin(jtang[3]);  c4  =  cos0tang[3]); 
s23  =  s2*c3  +  c2*s3;  c23  =  c2*c3  -  s2*s3; 
s234  -  sln(jtang[1]+jtang[2]+jtang[3]); 
c234  ==  cos(jtang[1]+jtang[2]+jtang[3]); 


fprintf(rangeptr;'%f\n%f\n%f\n%f\n%f\n%f\n%f\n%f\n’’,s1  ,s2,s3,s4,c1  ,c2,c3,c4); 
fprintf(rangeptr,"%f\n%f\n%f\n%f\n%f\n%f\n",s2*c3,c2*s3,s23,c23); 


/*  n  vector  V 

noap[0]  =  c1*c234; 
noap[li  =  s1*c234; 
noap[2]  =  s234; 


/*  0  vector 


V 
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noap[3]  *  -c1*s234; 
noapt4]  =  -s1*s234; 
noapis]  =  c234; 


/*  a  vector  */ 

noap[6]  =  si ; 

noap[7]  =  -c''; 
noap[8]  =  0.0; 


/*  p  vector  */ 

noap[9]  =  aO  +  c1*(a1  +  a2*c2  +  a3*c23): 
noap[10]  =  s1*(a1  +  a2*c2  +  a3*c23); 
noap[1 1]  =  a2*s2  +  a3*s23  +  d1 ; 


fprintf(rangeptr,"%f\n%f\n%f\n%f\n%f\n",a3*c23, 

a2*c2, 

a1+a2*c2+a3*c23, 
cl  *(a1  +a2*c2+a3*c23), 
sr(a1+a2*c2+a3*c23)); 


return; 

} 


mainO 

{ 

FILE  *fp: 

FILE  'rangeptr; 

float  jtang[6]; 

float  noapil2]; 

float  step  =  3.1415 /8.0; 


fp  =  fopen(“fwdkin.dat'',"w"); 
rangeptr  =  fopen('’range.dat",'W); 

for  (jtang[0]=-3.1415/4.0;jtang[0]  <  3.1415  /  4.0*30;  jtang[0]=jtangt0]+step) 
for  Gtang[1]=0.0jtang[1]  <  3.1415  /  3.0;  jtang[1]=jtang[1]+step) 
for  (jtang[2]=0.0;jtang[2]  <  3.1415  /  2.0;  jtang[2]=jtang[21+step) 
for  (itang[3]=0.0;jtang[3]  <  3.1415  /  2.0;  Jtang[31=jtang[3]+step) 

{ 

umdhFwdKin(itang,noap, rangeptr); 

fprintf(fp"%fVt%f\t%f\t%f\n",jtangI0],jtang[1],jtang[2].jtang[31); 
fpiintf(fp,”%f\t%f\t%f\t%f\n"  ,noap[0],noap[3],noap[6],noap[9]); 

fprintf(fp,”%f\t%f\t%f\t%f\n''  ,noap[1].noap[4].noa|:CT.noaPl''01); 

fprintf(fp"%f\t%f\t%f\t%f\n\n".noap[2],noap[51,noap[8].noap[11]); 

} 

fclose(fp); 

fcl06e(rangeptr); 


} 


FPGA  Processor  Implementation  for  the  Forward  Kinematics  of  the  UMDH  AP  PA-4 


A.2  Matlab  code 


“fk.!!!”  Matlab  code 

%  Steve  Parmley  % 

%  Matl^  code  that  loads  data  generated  by  C  code  % 

%  Plots  positions  of  last  joint  and  arc  of  fingertip  % 

clear; 
close  all; 
load  fwdkin.dat; 
fori=1:599, 
nx(l)=twdkin(i*4+2, 1 ); 
ny(i)=f^kin(i*4+3, 1 ); 
n2(l)=fwdkin(l*4+4, 1 ); 

QX(i)=lvvdkin(i*4+2,2); 

oy(i)=fwdkin(i*4+3,2); 

oz(i)==fwdkin(i*4+4,2); 

ax(i)=fwdkin(i*4+2,3); 

ay(l)=fwdkin(i*4+3,3); 

az(i)=fwdkin(i*4+4,3); 

px(l)=fwdkin(i*4+2,4); 

py(i):^fwdkin(i*4+3,4); 

pz(0==fwdkin(i*4+4,4); 

ppx(i)  =  px(i)  +  nx(i)  *  1125; 
ppy(0  =  py(i)  +  ny(i)  *  1125; 
ppz(l)  =  p2(i)  +  nz(i)  *  1125; 
end; 

forN1:24, 
px1(l)  =  px(i); 
py1(i)  =  py(i); 

P2l(i)  =  pz(i); 
ppx1(i)  =  ppx(i); 
ppy1(l)  =  ppy(l); 
ppz1(i)  =  ppz(i); 

px2(i)  =  px(l+24); 
py2(l)  =  py(i+24); 
pz2(i)  =  pz(i+24); 
ppx2(l)  =  ppx(i+24); 
ppy2(i)  =  ppy(i+24); 
ppz2(l)  =  ppz(i+24); 

px3(l)  =  px(l+49); 
py3(i)  =  py(i+49); 
pz3(i)  =  pz(i+49); 
ppx3(i)  =  ppx(i+49); 
ppy3(i)  =  ppy(i+49); 
ppz3{i)  =  ppz(i+49); 

px4(0  =  px(i+74); 
py4(i)  =  py(i+74); 
pz4(i)  =  pz(l+74); 
pp)^(i)  =  ppx(i+74); 
ppy4(i)  =  ppy(i+74); 
ppz4(i)  =  ppz(i+74); 

px5(i)  =  px(i+149); 
py5(i)  =  py(i+149); 
pz50)  =  pz(i+149); 
ppx^i)  =  ppx(l+149); 
ppy5(i)  s  ppy(i+149); 
ppz5(l)  =  ppz(i+149); 


FPGA  Processor  Implementation  for  the  Forward  Kinematics  of  the  UMDH  APP  A-5 


px6(i)  =  px{i+224); 
py6(i)  =  py{i+224); 
p26(i)  =  pz(i+224); 
ppx6(i)  =  ppx(i+224); 
ppy6(i)  =  ppy(i+224); 
ppr6(i)  =  ppz(i+224); 

px7(i)  =  px(i+299); 
py7(i)  =  py(i+299); 
pz7(i)  =  pz(i+299); 
ppx7(i)  =  ppx(i+299); 
ppy7(i)  =  ppy(i+299); 
ppz7(i)  =  pp2(l+299): 

end; 


plot3(ppx1.ppy1  ,ppz1  ,'-',px1  ,py1  .pzt  ;+',ppx2,ppy2,ppz2,'-',px2,py2,pz2,’tf,ppx3,ppy3,ppz3,'-.',px3,py3.pz3,’x’): 
grid; 

view(^5,10); 
axis([-3  3-6  017]); 

title  ('UMDH  Thumb  Motion  (joint  0  fwed)'); 

h=:legend('Flngertip  Positions  (Joint  2  Location  A)’, 'Joint  3  Positions  (Joint  2  Location  A)’, 'Fingertip  Positions  (Joint  2  Location  B)', 'Joint  3 

Positions  (Joint  2  Location  B)', 'Fingertip  Positions  (Joint  2  Location  C)',’ Joint  3  Positions  (Joint  2  Location  C)'); 

axes(h); 
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‘Vange.m”  Matlab  code 

%  Steve  Pamnley  % 

%  Matlab  code  that  loads  data  generated  by  C  code  % 

%  Plots  positions  of  last  joint  and  arc  of  fingertip  % 

load  fwdkin.dat; 

max(fwvdkin) 

min(fwdkin) 

load  range.dat; 
max(  range) 
min(range) 
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Appendix  B:  VHDL  Functional  Unit  Models 
and  Simulation  Testbenches 


B,1  Cosine/Sine  Unit 

B.l.l  Cosine/Sine  Model 


-  Project: 

-  Filename: 

-  Other  files  required: 
-Date: 

-  Entity/Architecture  Name: 
-Developer: 


Thesis 

cos_sin.vhd 

Sept  19  97 
cos_sin_e/behavior 
Steve  Parmley 


library  IEEE; 

use  IEEE.stdJogicJ  164.aII; 


entity  cos_sln__e  is 
port  (cos_sin_reset 
cosjsin_clk 
cosjsin_A_bus 
coS-Sinjgo 
cos_sinjsel 
cos_sinjwait 
cosjsin_ready 
cos  sin  C  bus 


In  std_ulogic; 

in  std_ulogic; 

in  std_ulogic_vector(1 5  downto  0); 

In  std_ulogic; 

in  std_ulogic; 

in  std_ulogic_ 

out  std_uloglc; 

out  std_ulogic_ 


_vector(2  downto  0); 
_vector(15  downto  0); 


-  the  following  describes  the  connection  to  the  rom 

cos_sin_rom_addr  :  out  std_ulogic„ 

cosjsinj-omjjata  :  in  std_ulogic^ 

end  cos  sin  e; 


.vector(12  downto  0); 
.vector(15  downto  0)); 


architecture  behavior  of  cos_sin_e  is 
begin 


lookup:  process 
variable  state :  integer, 
variable  waiLcount,  wait_counter :  integer; 

-  create  sinks  for  four  bits  not  used  of  AJdus 
variable  tempi  ,temp2,temp3,temp4 :  std_ulogic; 


begin 


if  cos_sin_reset  =  'T  then 
state :-  0; 

end  if; 

wait  until  (cos_sin_clk’event  and  cos_sin_clk=’1'); 

if  state  =  0  then 

-  turn  off  all  signals 
cos_sin_ready  <=  '0*; 

-  calculate  how  many  waits 
wait_count  :=  0; 
wait_counter  :=  0; 

if  cos_sin_wait(0)  =  '1'  then 
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wait_count  :=  wait_count  + 1 ; 

end  if; 

if  cos_sin_wait(1)  =  '1'  then 

wait_count  :=  wait_count  +  2; 

end  if; 

if  cos_sinjwait(2)  =  ‘1’  then 

wait_count  :=  wait_count  +  4; 

end  if; 

-  copy  o\/er  lower  8  decimal  bits  and  3  LSBs  of  integer 
cos_sin_rom_addr(10  downto  0)  <»  cos_sin_A_bus(10  downto  0); 
--  copy  In  sign  bit 

cos_8in_rom_addr(11)  <=  cos_sin_A_bus(15); 

-  copy  in  selector  bit  for  cos  or  sin  function 
cos_sin_romjaddr(12)  <=  cos_sin_sel; 


-  sink  the  4  unused  bits 
tempi  :=  cos_sin_A_bus(11); 
temp2  :=  cos_sin_A_bus(12); 
temp3  :=  cos_sin_A_bus(13); 
temp4  :=  cos_sin_A_bus(14); 

-  wait  for  go  signal 

if  cos_sinjgo  =  '1'  then 
state  :=  1; 

end  If; 

end  if; 

if  state  =  1  then 

-  induce  rom  wait  states  for  slower  external  devices 
if  waiLcount  =  wait_counter  then 

state  :=  2; 
else 

wait_counter  :=  walt_counter  +  1 ; 

end  If; 

end  if; 

if  state  =  2  then 

latch  data 

cosjsin_C_bus  <=  cos__sin_rom_dala; 

~  indicate  to  control  that  the  information  is  latched 
cos_sin_ready  <=  '1'; 

-  wait  one  cycle  and 
state  :=  3; 

elsif  state  =  3  then 

ready  signal 
cos_sin_ready  <-  ‘O'; 

-  start  over 
state 0; 

end  if; 

end  process  lockup; 
end  behavior; 
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B.h2  Cosine/Sine  Testbench 


-  Project: 

Thesis 

-  Filename: 

cos_sin-bench.vhd 

-  Other  files  required: 

-  Date: 

sept  19  97 

-  Entity/Architecture  Name: 

cos_sin_th/lest 

-  Developer: 

Steve  Parml^ 

library  IEEE; 

use  IEEE.stdJogicJ164.all; 

entity  cos^sin Jb  is 
end  cos_sinJb; 

architecture  test  of  cos_sin  Jb  is 

component  cos_sjn_e 

port  (cos_sin_res^ 

in 

std_uloglc; 

cos_sin_clk 

in 

std_ulogic; 

cos_sin_Aj5us 

in 

std_ulogic_vector(15  downto  0); 

cosjsinjgo 

in 

std_uloglc; 

cos_sin_sel 

in 

std_ulogic; 

cos_sin_wait 

in 

std_ulogic_vector(2  downto  0); 

cosj5in_ready 

out 

std_ulogic; 

cos  sin_C_bus 

out 

std_ulogic_vector(15  downto  0); 

-  the  following  describes  the  connection  to  the  rom 

cos_sin_rom_addr 

out 

std_ulogic_vector(12  downto  0); 

cos_sln_rom_data 

in 

std„ulogic_vector(15  downto  0)) 

end  component; 

signal  sys_res^,  sysjclk,  go,  sel,  ready :  std_ulogic  :=  'O’; 
signal  waits  :  std_uIoglc_vector(2  downto  0)  :=  "OOO"; 

signal  anglejn :  std_uloglc_vector(15  dcwnto  0); 
signal  result  :  std_ulogic_vector(1 5  downto  0); 

signal  rom_address ;  std_ulogic_vector(12  downto  0); 
signal  rom_result  :  std_ulogic_vector(15  downto  0); 


begin 

U1  :  cos_sin_e 
PORT  MAP 


(sys_reset, 

sys__clk, 

anglejn, 

go, 

sel, 

waits, 

ready, 

result, 

rom_address, 

rom_result); 


clock :  process 
begin 

sys_clk  <=  not(sys_clk); 
wait  for  10  ps; 
end  process  clock; 

rst :  process 
begin 


sys_reset  <=  'T; 
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wait  for  5  ps; 
sys_reset  <=  'O'; 
wait  for  15000  ps; 
end  process  rst; 

exercise :  process 
variable  wait_count :  integer  :=  0; 
begin 

-  do  it  again  with  more  waits 
case  wait_count  is 

when  0  =>  waits  <=  ”000"; 
when  1  =>  waits  <=  "001"; 
when  2  =>  waits  <=  "010"; 
when  3  =>  waits  <=  "01 1"; 
when  4  =>  waits  <=  "100"; 
when  5  «>  waits  <=  "101"; 
when  6  =>  waits  <=  "110"; 
when  7  «>  waits  <=  "111"; 

when  others  =>  wait  until  sys_clk’event  and  sys_clk=*1’ 
w/ait  until  sys_clk’event  and  sys_clk='r 
wait  until  sys_clk'event  and  sys_clk='1' 
wait  until  sys_clk’event  and  sys_clk='1' 
wait  until  sys_clk'event  and  sys_clk- 1' 
wait  until  sys_clk'event  and  sysjclk='1' 
ASSERT  false 
REPORT  "DONE" 

SEVERITY  failure; 

end  case; 

wait^count  :=  walt_count  +  1 ; 

wait  until  sys^clk'event  and  sys_clk~  O'; 

-  processor  is  setting  up  input  bus 
anglejn(15  downto  1)  <=  "000100100011010"; 
angleJn(O)  <=  waits(O); 

~  set  selection  to  sin  or  cos 
sel  <=  waits(O); 

-  wait  for  a  while 

wait  until  sys_clk'event  and  sys_clk- 1'; 

-  and  initiate  function 
go<='r; 

-  wait  for  function  to  report  ready 
wait  until  ready  =  '1'  and  readyevent; 

wait  until  sys_clk'event  and  sys_clk=*1'; 

-  turn  off  go  signal 
go  <=  'O'; 


end  process  exercise; 

rom :  process 
begin 

wait  until  rom_address'event; 

“  make  up  some  rom  data  (inverse  of  the  address  for  now) 
rom_result(12  downto  0)  <=  not(rom_address(12  downto  0)); 

-  fill  in  the  rest 

rom_result(15  downto  13)  <=  "1 11"; 
end  process  rom; 


end  test: 
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CONFIGURATION  cos_8in_c  OF  cosjsinjb  IS 
FOR  test 

FORALLcos  sinjB 

USE  ENTITY  WORK.cos_sln_e(behavior): 
END  FOR; 

END  FOR; 

END  cos_sin_c; 


B.1.3  Cosine/Sine  Results 
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B.2  Adder/Subtractor  Unit 

B.2.1  Adder/Subtractor  Model 


-  Project: 

-  Filename: 

-Other files  required: 

-Date: 

-  Entity/Architecture  Name: 
-Developer: 

-  Function: 

-  Limitations: 

-  History; 

-Last  Analyzed  On: 

Thesis 

adder.vhd 

sept  30  97 
adder_e^behavior 

Steve  Parmley 

library  IEEE; 

use  IEEE.stdJogicJ164.all; 

entity  adder je  is 

port  (adder_reset 

in 

std_ulogic; 

adder  elk 

in 

std_ulogic; 

adder  A  bus 

In 

std_ulogic_vector(15  downto  0); 

adder_B_bus 

in 

std_ulogic_vector(15  downto  0); 

adderjgo 

in 

std^ulogic; 

adder_sel 

in 

stdjiogic; 

adder  done 

out 

std_u!ogic; 

adder_C_bus 

out 

std_ulogic_vector(15  downto  0)) 

end  adderje; 

architecture  behavior  of  adder_e  is 

Signal  state :  integer; 

Signal  Bxor :  std_ulogic„v0ctor{15  downto  0); 
Signal  Cout :  std_ulogic_vector(15  downto  0); 
Signal  SUM ;  std_u!ogic_vector(15  downto 0); 

begin 

addsub :  process 
begin 

wait  until  adder_clk'event  and  adder_c!k='1'; 


if  adder_reset  =  '1'  then 
state  <=  0; 

end  if; 

if  adder jgo  =  ’1'  then 


if  state  =  0  then 

Bxor(0)  <=  adder_B_bus(0)  xor  adder^sel; 
Bxor(1)  <=  adder_B_bus{1)  xor  adder_sel; 
BxDr(2)  <=  adder_B_bus(2)  xor  adder_sel; 
Bxor(3)  <=  adderlB_bus(3)  xor  adder_sel; 
Bxor(4)  <=  adder_B_bus(4)  xor  adder_sel; 
Bxor(5)  <=  adder_B_bus(5)  xor  adder^sel; 
Bxor(6)  <=  adder_B_bus(6)  xor  adder^sel; 
Bxor(7)  <=  adder_Blbus(7)  xor  adder_sel; 
Bxor(8)  <-  adder_B_bus(8)  xor  adder_sei; 
Bxor(9)  <=  adder_B_bus(9)  xor  adder_sel; 
Bxor(10)  <=  adder_B_bus(10)  xor  adder_sel; 
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Bxor(1 1)  <=  adcler_B_bus(1 1)  xor  adder_sel; 

Bxor(12)  <-  aclder_B_bus(12)  xor  adder^sel; 

Bxor(13)  <=  adder_Blbus(13)  xor  adder^sel; 

Bxor(14)  <=  adder_B_bus(14)  xor  adder_sel; 

Bxor(15)  <=  adder_B_bus(15)  xor  adder_sel; 
state  <=1; 

^slf  state  ~  1  ther^ 

Cout(O)  <=  ((adder J\_bus(0)  and  Bxor(O))  or  (adder_se!  and  (adder_A_bus(0)  or  BxDr(0)) )); 
state  <=  2; 

^sff  state  ~  2  then 

SUM(O)  <=  ((adder  A_bus(0)  and  Bxor(0)  and  adder_sel)  or  ((adder_A_bu$(0)  or  Bxor(0)  or  adder_sel)  and  (not  Cout(O)))): 
Cout(1)  <=  ((adder3A_bus(1)  and  Bxor(1))  or  (Cout(O)  and  (adderJV_bus(1)  or  Bxor(1)) )); 
slate  <-  3; 
elstf  state  ^  3  then 

SUM(1)  <=  ((aclder_A_bus(1)  and  Bjraitl)  and  Cout(0))  or  ((adder_A_bus(1)  or  Bxor(1)  or  Cout(0))  and  (not  Cout(1)))): 
Cout(2)  <=  «adder~A_biJs{2)  and  Bxor(2))  or  (Cout(1)  and  (adder_A_bus(2)  or  Bxor(2)) )); 
state  <=  4; 
eisif  state  ~  4  then 

SUM(2)  <=  ((adder_A_bus(2)  and  Bxor(2)  and  Cout(1))  or  ((adder_A_bus(2)  or  Bxor(2)  or  Cout(1))  and  (not  Cout(2)))); 
Cout(3)  <=  ((adder jrbus(3)  and  Bxor(3))  or  (Cout(2)  and  (adder_A_bus(3)  or  Bxor(3)) )); 
state  <=  5; 
elsif  state  ~  5  then 

SUM(3)  <=  ((adder_A_bus(3)  and  Bxor<3)  and  Cout(2))  or  ((adder_A_bus(3)  or  Bxor(3)  or  Cout(2))  and  (not  Cout(3)))); 
Cout(4)  <=  ((adder_A_bus(4)  and  Bxor(4))  or  (Cout(3)  and  (adder_A_bus(4)  or  Bxor(4)) )); 
state  <=  6; 

elsif  state  =  6  then  ,  ^ 

SUM(4)  <=  ((adder_A_bus(4)  and  Bxor(4)  and  Cout(3))  or  ((adder_A_bus(4)  or  Bxor(4)  or  Cout(3))  and  (not  Cout(4)))): 
Cout(5)  <=  ((adder_A_bus(5)  and  Bxor(5))  or  (Cout(4)  and  (adder_A_bus(5)  or  Bxor(5)) )); 
state  <=  7; 
elsif  state  “  7  th^ii 

SUM(5)  <=  ((adder J\_bus(5)  and  Bxor(5)  and  Cout(4))  or  ((adder J\_bus(5)  or  Bxor(6)  or  Cout(4))  and  (not  Cout(5)))); 
Cout(6)  <=  ((adder_A_bus(6)  and  Bxor(6))  or  (Cout(5)  and  (adder_A_bus(6)  or  Bxor(6)) )); 
state  <=  8; 

elsif  state  =  8  then  ^ 

SUM(6)  <=  ((adder_A_bus(6)  and  Bxor(6)  and  Cout(5))  or  ((adder„A_bus(6)  or  Bxor(6)  or  Cout(5))  and  (not  Cout(6)))); 
Cout(7)  <=  ((adder_A_bus(7)  and  Bxor(7))  or  (Cout(6)  and  (adder_A_bus(7)  or  Bxor(7)) )); 
state  <=  9; 
elsif  state  ^  9  th^i 

SUM(7)  <=  ((adder_A_bus(7)  and  Bxor(7)  and  Cout(6))  or  ((adder_A-bus(7)  or  Bxor(7)  or  Cout(6))  and  (not  Cout(7)))); 
Cout(8)  <=  ((adder_A__bus(8)  and  Bxor(8))  or  (Cout(7)  and  (adder_A_bus(8)  or  Bxor(8)) )); 
state  <- 10; 
elsif  state  =  10  then 

SUM(8)  <=  ((adder_A_bus{8)  and  Bxor(8)  and  Cout(7))  or  ((adderJV_bus(8)  or  Bxor(8)  or  Cout(7))  and  (not  Cout(8)))); 
Cout(9)  <=  ((adder_A_bus(9)  and  Bxor(9))  or  (Cout(8)  and  (adderJV_bus(9)  or  Bxor(9)) )); 
state  <=  11; 
dsif  ststo  thon 

SUM(9)  <-  ((adder_A_bus(9)  and  Bxor(9)  and  Cout(8))  or  ((adder_A_bus(9)  or  Bxor(9)  or  Cout(8))  and  (not  Cout(9)))); 
Cout(10)<=  ((adder_A_bus(10)  and  Bxor(10))  or  (Cout(9)  and  (adder_A_bus(10)  or  Bxor(10)) )); 
state  <=  12; 
elsif  state  ~  1 2  then 

SUM(10)  <=  ((adder_A_bus(10)  and  Bxor(10)  and  Cout(9))  or  ((adder_A_bus(10)  or  Bxor(10)  or  Cout(9))  and  (not 
Cout(10)))); 

Cout(11)<=  ((adder_A_bus(11)  and  Bxor(11))  or(Cout(10)  and  (adder_AL.bus(11)  or  Bxor(11)) )); 

State  <=  13; 
elsif  state  =  13  then 

SUM(1 1)  <=  ((adder_A_bus(1 1)  and  Bxor(1 1)  and  Cout(10))  or  ((adder  J\_bus(1 1)  or  Bxor(1 1)  or  Cout(10))  and  (not 
Cout(11)))); 

Cout(12)<=  ((adder J!Lbus(12)  and  Bxor(12))  or  (Cout(1 1)  and  (adder_A_bus(12)  or  Bxor(12)) )); 
state  <=  14; 
elsif  state  “  14  then 

SUM(12)  <=  ((adder_A_bus(12)  and  Bxor(12)  and  Cout(11))  or  ((adder_A_bus{12)  or  Bxor(12)  or  Cout(11))  and  (not 
Cout(12)))); 

Cout(13)<=  ((adder_A_bus(13)  and  BxDr(13))  or  (Cout(12)  and  (adder_A_bus(13)  or  Bxor(13)) )); 
state  <=  15; 
elsif  state  =15  then 
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SUM(13)  <=  ((adcter_A_bus(13)  and  Bxor(13)  and  Cout(12))  or  ((adder_A_bus(13)  or  Bxor(13)  or  Cout(12))  and  (not 
Cout(13)))): 

Cout(14)<=  ((adder_A_bus(14)  and  Bxor(14))  or  (Cout(13)  and  (adder_A_bus(14)  or  Bxor(14)) )); 

State  <=  16; 
eisif  state  ~  16  then 

SUM(14)  <=  ((adder_A_bus(14)  and  Bxor(14)  and  Cout(13))  or  {(adder_A_bus(14)  or  Bxor(14)  or  Cout(13))  and  (not 
Coul(14)))): 

Cout(15)<=  ((adder_A_bus(15)  and  Bxor(15))  or  (Cout(14)  and  (adder_A_bus(15)  or  Bxor(15)) )); 
state  <=  17; 

^sif  state  ~  1 7  then 

SUM(15)  <=  ((adder_A_bus(16)  and  Bxor(15)  and  Cout(14))  or  ((adder_A_bus(15)  or  Bxor(15)  or  Cout(14))  and  (not 
Cout(15)))); 

state  <=  18; 
elsif  state  =  18  then 

adder_C_bus  <=  SUM; 
adder_done  <-  ’1*; 

end  if; 
else 

adderjione  <=  'O'; 
state  <=  0; 


end  If; 


end  process  addsub; 
end  behavior; 


B.2.2  Adder/Subtractor  Testbench 


~  Project; 

Thesis 

-  Filename; 

adder-bench.vhd 

-  Other  files  required; 

-  Date; 

sept  30  97 

-  Entity/Architecture  Name; 

adder Jbftest 

-  Developer; 

Steve  Parmley 

-  Function: 

-  Limitations; 

-  History: 

-  Last  Analyzed  On; 

library  IEEE; 

use  !EEE.stdJogicJ164.all; 


entity  adder_tb  is 
end  adderjtb; 

architecture  test  of  adder Jb  is 

constant  AtestOO  :  std_uIogic_vector(15  downto  0)  :=  "0000000000000000"; 
constant  AtestOI :  std_ulogic_yector{15  downto  0)  :=  "0000000000000001"; 
constant  Atest02  :  std_^ulogic_vector(15  downto  0)  :=  "0000000000000010"; 
constant  Atest03 :  std_ulogic__vector(15  downto  0)  :=  "0000000000000011"; 
constant  Atest04  :  stcl_ulogic_vector(1 5  downto  0)  :=  "0101010101010101"; 
constant  AlestOS :  std_uIogic_vector(1 5  downto 0)  ;=  "1 01 01 01 01 01 01 01 0"; 
constant  Atest06  :  sTd_uIogic_vector(1 5  downto  0)  :=  "1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0"; 
constant  AtestO? ;  std_ulogic_vector(1 5  downto  0)  :=  "1 1 1 1 1 1 1 1 01 1 1 1 1 1 1 "; 
constant  Atest08  :  std_ulogic__yector(1 5  downto  0)  :=  "01 11111111111111"; 
constant  AtestOO :  std_uTogic_vector(15  downto  0)  :=  ”1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1"; 
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constant  BtestOO  :  std_ulogic_vector(15  downto  0)  "OOOOCKDOOOOOOOOOO";  —  +/-  0 
constant  BtestOI :  std_ulogic_yector(15  downto  0)  ‘’OOOOOOOOOOOCX)C)01";  —  +/- 1 
constant  Btest02  :  std_u!ogic_vector(15  downto  0)  :=  "0000000000000010";  -  +/-  2 
constant  BtestOS :  std_ulogic_vector(15  downto  0)  :=  "0000000100000000";  ~  +/-  256 
constant  Btest04  :  std_ulogic_vector(1 5  downto  0)  :=  "1 000000000000000";  -  +/-  32K 
ccxistant  BtestOS :  std^ulogic_vector(1 5  downto  0)i- "1111111111111111"!^  65534 

constant  add :  std_ulogic  :=  'O'; 
constant  sub :  std_u!ogic  :=  '1'; 
component  adder_e 
port  (adder_reset 
adder_c!k 
adder_A_bus 
adder_BJ)us 
adderjgo 
adder_se! 
adderjdone 
adder_C_bus 
end  component; 

signal  sys_clk,sys__reset,  go,  sel,  done :  std^ulogic  ;=  'O'; 
signal  A,B,  result  :  std_ulogic_vector(1 5  downto  0); 

begin 

U1 :  adder  e 
PORjmP  (sys_.reset, 
sys  elk, 

A. 

B, 
go, 
se), 
done, 
result); 


in  std_ulogic; 

in  std__uIogic; 

in  std_ulogic_vector(1 5  downto  0); 

in  std_ulogic_vector{1 5  downto  0); 

in  std_ulogic; 

in  std_ulogic; 

out  std_uloglc; 

out  std_u!oglc_vector(1 5  downto  0)); 


clock :  process 
begin 

sys_clk  <=  not(sys_clk); 
wait  for  10  ps; 
end  process  clock; 


exercise :  process 

variable  inputA,  inputs :  std_ulogic_vector(15  downto  0); 
begin 


sys_reset  <=  'O’; 


For  i  In  0  to  1  loop 
-  add  or  sub 

CASE  i  IS 

WHEN0=>  sel  <=  add; 

WHEN  1  =>  sel  <=  sub; 

END  CASE; 

forj  in  0  to  9  loop 
for  I  in  0  to  5  loop 
-  pick  a  test 

CASE  j  IS 

WHEN  0  =>  inputA  AtestOO 
WHEN  1  =>  inputA  :=  AtestOI 
WHEN  2  =>  inputA  :=  Atest02 
WHEN  3  =>  inputA  :=  Atest03 
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WHEN  4  =>  inputA  :=  Atest04 
WHEN  5  =>  inputA  :=  AtestOS 
WHEN  6  =>  inputA  :=  Atest06 
WHEN  7  =>  inputA  :=  AtestO? 
WHEN  8  ==>  inputA  :=  AtestOS 
WHEN  9  =>  inputA  :=  Atest09 
END  CASE; 

CASE !  IS 

WHEN  0  =>  inputs  BtestOO 
WHEN  1  =>  inputs  :=  BtestOI 
WHEN  2  =>  inputs  :=  Btest02 
WHEN  3  =>  inputs  :=  Btest03 
WHEN  4  =>  inputs  :=  Btest04 
WHEN  5  =>  inputs  :=  BtestOS 
END  CASE; 


go  <=  'O'; 

wait  until  done  =  'O'; 

FOR  k  IN  0  T0 16  loop 

A(k)  <=  inputA(k); 

B(k)  <=  inputB(k); 

end  loop; 

wait  until  sys_clk'event  and  sys_clk='0'; 
go  <=  ‘1‘; 

wait  until  done  ='T; 


go  <=  'O’; 
end  loop; 
end  loop; 
end  loop; 


wait  until  sys_clk*event  and  sys_clk=’0'; 
wait  until  sysjclk'event  and  sysjclk='0'; 
wait  until  sysjclk'event  and  sys  jClk='0’; 
wait  until  sys  jClk’event  and  sys  jClk=’0’; 
wait  until  sys  jdk'event  and  sys ^clk- O'; 
wait  until  sysjclk'event  and  sys  jClk='0'; 
wait  until  sys  jClk’event  and  sySjClk='0’; 
wait  until  sySjClk'event  and  sySjClk=’0'; 
wait  until  sys ^clkevent  and  sys  jClk=’0’; 
wait  until  sysjclk'event  and  sys ^clk-O'; 
wait  until  sysjclk’event  and  sys jClk='0’; 
wait  until  sys  jClk’event  and  sys  jClk=’0'; 
wait  until  sySjClk'event  and  sySjC!k='0'; 
wait  until  sySjClk'event  and  sySjClk- O'; 
wait  until  sys  jClk’event  and  sys  jClk='0'; 
wait  until  sys  jClk'event  and  sys  jClk=*0'; 
wait  until  sys  jClk’event  and  sys jClk='0'; 
wait  until  sys  jClk'event  and  sys_clk='0'; 


ASSERT  false 

REPORT  "DONE" 
SEVERITY  failure; 
end  process  exercise; 
end  test; 
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CONFIGURATION  adder^c  OF  adder^tb  IS 
FOR  test 

FOR  ALL:  adder^e 

USE  ENTITY  WORK.adder_e(behavior): 
END  FOR; 

END  FOR; 

END  adder_c; 


B.2.3  Adder/Subtractor  Results 
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-  Project:  Thesis 

-  Filename:  add€r32.vhd 

-  Other  files  required: 

-  Date:  sept  30  97 

-  Entity/Architecture  Name:  adder32_e/behavlor 

~  Developer  Steve  Parmley 

-  Function: 

-  Limitations: 

-  History: 

-  Last  Analyzed  On: 


library  IEEE; 

use  IEEE.stdJogic_1 164.all; 
entity  adder32je  is 

port  (adder^reset  in  std^ulogic; 

adder_clk  in  std_uIogic; 

adder]X.bus  in  std_uloglc_vector(31  dcwnlo  0); 

adder^B^bus  in  std„ulogic_vector(31  downto  0); 

adderjgo  in  std_uloglc; 

adder^sel  in  std_uIogic; 

adder_done  out  std_ulogic; 

adder^C^bus  out  std_uloglc_vector(31  downto  0)); 

end  adder32_e; 

architecture  behavior  of  adder32_e  Is 
Signal  state :  integer; 

Signal  Bxor :  std_ulogic_vector(31  downto  0); 

Signal  Gout :  std_ulogic_vector(31  downto  0); 

Signal  SUM :  std_ulogic_vector(31  downto  0); 

begin 

addsub :  process 
begin 

wait  until  adder_clk’event  and  adder_clk='1*; 

if  adder_resel  =  ‘1*  then 
state  <=  0; 

end  if; 

if  adderjgo  =  ‘1’  then 
if  state  =  0  then 

Bxor(0)  <=  adder_B_bus(0)  xor  adder_sel; 

Bxor(1)  <=  adder_B_bus(1)  xor  adderjsel; 

BxDr(2)  <=  adder_B_bus(2)  xor  adderjsel; 

Bxor(3)  <=  adder_B_bus(3)  xor  adderjsel; 

Bxor(4)  <=  adder_BjDus(4)  xor  adderjsel; 

Bxor(6)  <=  adder_B_bus(5)  xor  adderjsel; 

BxDr(6)  <=  adder_B_bus(6)  xor  adderjsel; 

Bxor(7)  <=  adder_B_bus(7)  xor  adderjsel; 

Bxor(8)  <=  adder_B_bus(8)  xor  adderjsel; 

Bxor(9)  <=  adder_B_bus(9)  xor  adderjsel; 

Bxor(10)  <=  adder_B_bus(10)  xor  adder_sel; 
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Bxor(1 1)  <=  adder_B_bus(1 1)  xor  adder^set; 

Bxor(12)  <=  adder_B_bus(12)  xor  adder_sel; 

Bxor(13)  <=  adder_B_bus(13)  xor  adder_sel; 

Bxor(14)  <=  adderlBlbus(14)  xor  adder_sel; 

Bxor(15)  <=  adder_B_bus(15)  xor  adder^sel; 

Bxor(16)  <=  adderlB_bus(16)  xor  adder^sd; 

Bxor(17)  <-  adder_B„bus(17)  xor  adder^sel; 

Bxor(18)  <=  adder_B_bus(18)  xor  adder_sel; 

Bxor(19)  <=  adder_B_bus(19)  xor  adder_sel; 

Bxor(20)  <=  adder_B_bus(20)  xor  adder_sel; 

Bxor(21)  <=  adder_B_bus(21)  xor  adder_sel; 

Bxor(22)  <=  adder_B_bus(22)  xor  adder^sel; 

Bxor(23)  <=  adder_B_bus{23)  xor  adder^sel; 

Bxor(24)  <=  adder_B_bus{24)  xor  adder__sel; 

Bxor(25)  <=  adder_B_bus(25)  xor  adder_sel; 

Bxor(26)  <=  adder_B_bus(26)  xor  adder_sel; 

Bxor(27)  <=  adder_B_bus(27)  xor  adder_sel; 

Bxor(28)  <=  adder_B_bus(28)  xor  adder^sel; 

Bxor{29)  <=  adder_B_bus(29)  xor  adder_sel; 

Bxor(30)  <=  adder_B_bus(30)  xor  adder_sel; 

Bxor(31)  <=  adder_B_bus(31)  xor  adder_sel; 
state  <~1; 

^sif  state  ~  1  then 

Cout(0)  <=  ((adder_A_bus(0)  and  Bxor(0))  or  (adder_sel  and  (adder_A_bus(0)  or  Bxor(0)) )); 
state  <=5  state  +  1 ; 
elsif  state  -  2  then 

SUM(state-2)  <=  ((adder_A_bus(state-2)  and  Bxor(state-2)  and  adder_sel)  or  ((adder_A_bus(state-2)  or  Bxor(state-2) 
or  adder__sel)  and  (not  Cout(state-2)))); 

Cout(state-I)  <=  ((adderjA_bu8(state-1)  and  Bxor(state-1))  or  (Cout(stale-2)  and  (adder_A_bus(stale-1)  or  Bxor(state- 

1)))); 

state  <=2:  state  + 1; 

elsif  state  >=  3  and  state  <=  32  then 

SUM(state-2)  <~  ({adderJ\_bus(state-2)  and  Bxor(state-2)  and  Cout(state-3))  or  ((adder_A_bus(state-2)  or  Bxor(state- 
2)  or  Cout(state-3))  and  (not  Cout(state-2)))); 

Cout(state-I)  <=  ((adder_A_bus(state-1)  and  Bxor(state-1))  or  (Cout(state-2)  and  (adderJLbus(state-l)  or  Bxor(state- 

D) )); 

state  <=  state  +  1; 
elsif  state  =  33  then 

SUM(31)  <=  ((adder_A_bus(31)  and  Bxor(31)  and  Cout(30))  or  ((adder_A_bus(31)  or  Bxor(31)  or  Ck)ut(30))  and  (not 
Cout(31)))); 

state  <=  state +1; 
elsif  state  =  34  then 

adder_C_bus  <=  SUM; 
adder^done  <=  '1‘; 

end  if; 
else 

adderjJone  <=  ’O’; 
state  <=  0; 


end  if; 


end  process  addsub; 
end  behavior; 


-  Prcject: 

-  Filename: 

-Other files  required: 

-Date: 

-  Entity/mult_A_busrchitecture  Nanr^e: 

-  Developer: 

-  Function: 

-  Limitations; 


Thesis 

mult.vhd 

Oct  10  97 
mult32_e/behavior 
Steve  Parml^ 
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-  History: 

-  Last  Analyzed  On: 


library  IEEE; 

use  IEEE.std_logicJ164.all; 

entity  mult_e  is 
port  (mult_reset 
mult_clk 
mult_A_bus 
nnult_B_bus 
muitjgo 
muitjone 
niuft-C_bus 
end  mult_e; 

architecture  behavior  of  nnult_e  is 
Signal  state,  stat^ ;  integer, 

Signal  result00,result01,result02,result03,result04,result05,resutt06,result07, 
result08,result09,result10,result1 1  ,result12,result13,result14,result15 
:  std_ulogic_vector(31  dcjwnto  0); 

signal  sys_clk,  sys_reset,  go,  sel,  done :  std_ulogic  :=  ’O’; 
signal  AB,  result  ”  :  std_ulogic_vector(31  downto  0); 

component  adder32_e 
port  (adder_reset 
adder_clk 
adder_A_bus 
adder_B_bus 
adderjgo 
adder_sel 
adderjone 
adder_C_bus 
end  component; 

begin 

U1 :  adder32_e 
PORT  MAP  (sys_resel, 
sys_clk, 

A 
B, 
go. 
sel, 
done, 
result); 

sys_clk  <=  mult_clk; 
sel  <=  'O'; 

sys_reset  <=  mutt_reset; 
addsub :  process 
begin 

wait  until  mult_c!k’event  and  mult_clk- 1'; 


jn  5iu_uiuyn.», 

in  std_ulogic; 

In  std_ulogic_vector(31  downto  0); 

in  std_ulogic_vector(31  downto  0); 

in  std_ulogic; 

in  std_ulogic; 

out  std_uloglc; 

out  std  ulogic  vector{31  downto  0)); 


in 

std_ulogic; 

in 

std_ulogic; 

in 

std_ulogic_vector(15  downto  0); 

in 

std_uIogic_vector(15  downto  0); 

in 

std_ulogic; 

out 

std_u!oglc; 

out 

std_ulogic_vector{15  downto  0)); 

if  mult_reset  =  '1*  then 
state  <=  0; 


end  if; 


if  muItjgo  =  'T  then 
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if  state  =  0  then 

resultOO  <=  "0C)C)(XX)C)000()000(X)0C)^^ 
resultOI  <="(XX)C)00(XI0C)0(X)()(XX)(^ 
result02  <=  "OOOOOC)000(KX)OOOOOOOOOOOOOOOOOOOa 
resu!t03  <=  "C)(X)000000000()C)0CKXX)000^ 
result04  <=  ''OC)O(X)CK)0OO(X)0OOC)OO0O(» 
resultOS  <=  "000CK)CXK)(X)0000000(X)(^^ 
resuitoe  <=  "000000000(X)CIC)00C)00(XX)^ 
fesult07  <=  "00000000000000000c)0000000(^ 
resultOS  <=  "CXXiOOOOOOCWOOOOCXX^ 
resutt09  <=  "(XX)(XXX)0(K)0(X)00C)0000(X)C^^ 
resuftiO  <=  "CK)(X)00000000000000000000()0C)00(^ 
result11  <^”0OOCX)O(X)(X)O0OC)()(X)0O()^ 
result12  <=  "000000(x)0000c)0000 
result13  <=  "(XX)CXXXXXX)()00(KX)0(XX)0(X)(^^ 
result14  <=  "0CK)()(X)00()0000C)(X)00(^ 
result15  <=  ''000C)CX)00(X)0CKX)0000C)(X)0^ 


state  <=  1; 

elsif  state  =  1  then 

for  i  in  0  to  15  loop 
if  mult_Bj3us(i) « '1'  then 
case!  is 

when  0  =>  result00(15  downto  0)  <=  muit_A_bus ; 
when  1  =>  rBSult01(16  downto  1)  <=  muit_A_bus ; 
when  2  =>  result02(17  downto  2)  <=  multA-bus ; 
when  3  =>  resu!t03(18  downto  3)  <=  mulLA^bus  ; 
when  4  =>  result04(19  downto  4)  <=  muitJV^bus  ; 
when  5  =>  result05(20  downto  5)  <=  mult_A_bus ; 
when  6  =>  result06(21  downto  6)  <=  mult_A_bus ; 
when  7  =>  result07(22  downto  7)  <=  mult_A_bus ; 
when  8  =5>  result08(23  downto  8)  <-  mult_A_bus ; 
when  9  =>  iresult09(24  downto  9)  <=  nnult_jA_bus ; 
when  10  =>  result10(25  downto  10)  <=  muIt_A_bus  ; 
when  11  =>  resu!t11(26  downto  11)  <=  mulLA^bus  ; 
when  12  =>  result12(27  downto  12)  <=  mulLA^bus  ; 
when  13  =>  re8ult13(28  downto  13)  <=  mult_A.bus ; 
when  14  =>  result14(29  downto  14)  <=  mult_A_bus  ; 
when  15  “>  result15(30  downto  15)  <=  muIt_A_bus ; 
when  others  => 
end  case; 

end  if; 
end  ioop; 
state  <=  2; 

elsif  state  =  2  then 
go  <=  'O'; 
if  done  = 'O' then 

A  <=  resultOO; 

B  <=  resultOI; 
state  <=  3; 

end  if; 

elsif  state  =  3  then 
go<=’1'; 
if  done  =  '1'  then 

state  <=  4; 
state2  <=  0; 

end  if; 


elsif  state  >=  4  and  state  <=15  then 
if  state2  =  0  then 
go  <=  'O'; 
if  done  =  'O'  then 

A  <=  result; 
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case  state  is 

when  4  =>  B  <=  result02; 
when  5  =>  B  <=  resultOS; 
when  6  =>  B  <=  result04; 
when  7  =>  B  <=  resultOS; 
when  8  =>  B  <=  resultOB; 
when  9  =>  B  <=  resuttOT; 
when  10  =>  B  <=  resultOB; 
when  1 1  =>  B  <=  result09; 
when  12  =>  B  <=  resultIO; 
wh©i  13  =>  B  <=  resultll; 
when  14  =>  B  <=  result12; 
when  15  =>  B  <=  result13; 
when  16  =>  B  <=  result14; 
when  17  =>  B  <=  result15; 
when  others  => 
end  case; 
state2  <=  1; 

end  if; 

elsif  8tate2  =  1  then 
go<=’1'; 
ifdone  =  '1’then 

state2  <=  0; 
state  <=  state  +1 ; 

end  if; 

end  if; 

elsif  state  =  18  then 

mult_C_bus  <=  result(23  downto  8); 
mult_done  <=  '1’; 

end  if; 
else 

mult_done  <=  'O'; 
state  <=  0; 

end  if; 


end  process  addsub; 
end  behavior 


B.3.2  Multiplier  Testbench 


-  Project: 

Thesis 

-  Filename: 

adder32-bench.vhd 

-  Other  files  required: 

-Date: 

sept  30  97 

-  Entity/Architecture  Name: 

adder32_tb/test 

-Developer 

Steve  Parmley 

-  Function: 

-  Limitations: 

-  History: 

-  Last  Analyzed  On: 

library  IEEE; 

use  IEEE.stdJogicJ164.all; 


entity  adder32Jtb  is 
end  adder32_tb; 
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architecture  test  of  adder32_tb  is 

constant  AtestOO  :  std_ulogic_vector(31  downto  0)  :=  "(XXXXKXXXXWOOOO^^ 
constant  AtestOI  '  std_ulogic_vector(31  downto  0)  "00000000000000000000000000000001  ; 

constant  Atest02  :  std_^ulogic_vector(31  downto 0) !— "00000000000000000000000000000010 
constant  AtestOS :  std__uIogic_vector(31  downto  0)  "0000000000000000000000000000001 1  ; 

constant  Atest04  :  std_ulogic_vector(31  downto  0)  :=  "01 01 0101 01 01 01 01 01 01 01 01 01 01 01 01" 
constant  AtestOS :  std_ulogic_vector(31  downto  0)  :=  "1 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 0"; 
constant  AtestOS  :  std_ulogic_vector(31  downtoO)  :=  "11111111111111111111111111111110" 
constant  AtestOT ;  std_uTogic_vector(31  downtoO)  "11111111011111111111111101111111"; 
constant  AtestOS  :  std_ulogjcjN^BCtor(31  downto  0)  :=  "01 111111111111111111111111111111” 
constant  AtestOO :  std_ulogicjwector(31  downto  0):=  "11111111111111111111111111111111"! 


constant  BtestOO  :  std_ulogic_vector(31  downto  0)  "00000000000000000000000000000000  ; 

constant  BtestOI  :  std_ulogic_vector(31  downto  0)  :=  "00000000000000000000000000000001"; 
constant  Btest02  : std_uloglcjvector(31  downtoO) ;=  "00000000000000000000000000000010"; 
constant  Btest03 :  std__ulogic_^vector(31  downto  0)  "00000000000000000000000100000000  ; 

constant  Btest04  i  std_^ulogic_vector(31  downtoO)  ;=  "10000000000000000000000000000000"; 
constant  BtestOS :  std_ulogic_VBCtor(31  downto  0):=  "1111111111111111111111111111111 1"! 

constant  add :  std_ulogic  :=  'O’; 
constant  sub :  std_ulogic  :=  '1'; 


component  adder32_e 

std.ulogic; 

port  (adder_reset 

In 

adder_clk 

in 

std_ulogic; 

adder_A-bus 

in 

std_ulogic_vector(31  downto  0); 

adder_B_bus 

in 

std_ulogic_vector(31  downtoO); 

adderjgo 

in 

std_ulogic; 

adder_sel 

in 

std_ulogic; 

adderjjone 

out 

std_ulogic; 

adder_C_bus 
end  component; 

out 

std_ulogic_vector(31  downto  0)); 

signal  sys_clk,sys_reset,  go,  sel,  done :  std^ulogic  :=  'O'; 
signal  A,B,  result  :  std_ulogic_vector(31  downto  0); 


begin 

U1  :  adder32je 
PORT  MAP  (sys_reset, 
sys_clk, 

A, 

B. 
go. 
set, 
done, 
result); 


clock :  process 
begin 

sys_clk  <=  not(sys_clk); 
wait  for  lOps; 
end  process  clock; 


exercise :  process 

variable  inputA,  inputs :  std_ulogic_vector(31  downto  0); 
begin 


sys_reset  <=  'O'; 


ForiinOtol  loop 
~  add  or  sub 


CASE  I  IS 
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WHEN  0  =>  sel  <=  add; 

WHEN  1  =>  sel  <=  sub; 

END  CASE; 

for  j  in  0  to  9  loop 
for  I  in  0  to  5  loop 
—  pick  a  test 

CASE]  IS 

WHEN  0  =>  inputA  ;=  AtestOO 
WHEN  1  =>  inputA  ;=  AtestOI 
WHEN  2  =>  inputA  :=  Atest02 
WHEN  3  =>  inputA  ;=  Atest03 
WHEN  4  =>  inputA  ;=  Atest04 
WHEN  5  =>  inputA  ;=  AtestOS 
WHEN  6  =>  inputA  ;=  AtestOS 
WHEN  7  =>  inputA  :=  Atest07 
WHEN  8  =>  inputA  ;=  AtestOS 
WHEN  9  =>  inputA  ;=  Atest09 
END  CASE; 

CASE  I  IS 

WHEN  0  =>  inputs  BtestOO 
WHEN  1  =>  inputs  :=  StestOI 
WHEN  2  =>  inputs  :=  Btest02 
WHEN  3  =>  inputs  :=  BtestOS 
WHEN  4  ->  inputs  :=  Btest04 
WHEN  5  =>  inputs  :=  BtestOS 
END  CASE; 


go  <=  'O’; 

wait  until  done  =  'O’; 


FOR  k  IN  0  TO  31  loop 

A(k)  <=  inputA(k); 

B(k)  <=  inputB(k); 

end  loop; 

wait  until  sys_clk'event  and  sys__clk='0'; 
go<='1'; 

wait  until  done  =*1'; 

go<='0'; 
end  loop; 
end  loop; 
end  loop; 


wait  until  sys_clk'event  and  sys_clk=’0'; 
wait  until  sys^clk'event  and  sys_clk='0'; 
wait  until  sys_clk’event  and  sys_clk-0'; 
wait  until  sys_clk'event  and  sys_clk='0’; 
wait  until  sys_clk’event  and  sys_clk='0'; 
wait  until  sys_dk’event  and  sys_clk=’0'; 
wait  until  sys^clk'event  and  sys_clk=’0'; 
wait  until  sys_clk'event  and  sys^clk- O’; 
wait  until  sys__clk'event  and  sys_clk='0'; 
wait  until  sysj:lk'event  and  sys_clk='0'; 
wait  until  sys^clk'event  and  sys_clk='0'; 
wait  until  sys_clk'event  and  sys_clk- O’; 
wait  until  sys_clk'event  and  sys_clk='0'; 
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wait  until  sys_clk'event  and  sys_clk='0' 
wait  until  sys^clk’event  and  sys_clk=’0' 
wait  until  sys_clk’event  and  sys_clk=’0' 
wait  until  sys_clk'event  and  sys_clk='0* 
wait  until  sys.clk’event  and  sys_clk=*0’ 


ASSERT  false 

REPORT  "DONE" 
SEVERITY  failure; 
end  process  exercise; 
end  test; 


CONFIGURATION  adder32_c  OF  addef32Jb  IS 
FOR  test 

FOR  ALL;  adder32_e 

USE  ENTITY  WORK.adder32_e(behavior); 
END  FOR; 

END  FOR; 

ENDadder32_c; 


—  Project: 

Thesis 

-  Filename: 

mult-bench.vhd 

-  Other  files  required: 

-  Date: 

oct  10  97 

-  Entity/Architecture  Name; 

mult_tb/test 

-  Developer: 

Steve  Parmley 

~  Function: 

-  Limitations: 

-  History: 

-Last  Analyzed  On: 

library  IEEE; 

use  IEEE.stdJogicJ164.all; 


entity  mult^tb  is 
end  multjb; 

architecture  test  of  multjb  Is 

constant  AtestOO  :  std_ulogic_vector(15  downto  0)  :=  "CXXXKXXXXXXXXKXX)"; 
constant  AtestOI :  std_ulogic_yector(15  downto  0)  :=  "OOOOOOCXXXXXXXWI"; 
constant  Atest02  :  std_uloglc__VBCtor(15  downto  0)  :=  ’'000000CXXXKX)0010"; 
constant  AtestOS :  std_ulogic_vector(15  downto  0)  :=  “OOOOCXXKIOOOOOO1 1"; 
constant  Atest04  :  std_ulogic_vector(  1 5  downto  0)  :=  "01 01 010101010101 
constant  AtestOS :  std_uloglc_vector(16  downto  0)  :=  "1010101010101010"; 
constant  AtestOS  :  std_ulogic_vector(1 5  downto  0)  :=  ”1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0"; 
constant  AtestOT ;  std_uloglc_vector(1 5  downto  0):=  "1111111101111111"; 
constant  AtestOS  :  std_uloglc_vector(1 5  downto  0)  :=  ”0111111111111111”; 
constant  AtestOO :  std_ulogic_vector(1 6  downto  0):=  ”1111111111111111"; 


constant  BtestOO  :  std_ulogic_vector(1 5  downto  0)  :=  "0000000000000000";  -  +/-  0 
constant  BtestOI :  std_ulogic_vector(15  downto  0)  :=  "0000000000000001";  -  +/- 1 
constant  Btest02  :  std_ulogic_vector(1 5  downto  0)  :=  "000000000000001 0";  -  +/-  2 
constant  BtestOS :  std_uloglc_vector(15  downto  0)  :=  "00000001 00000000";  --  +/-  256 
constant  Btest04  :  std_ulogic_vector(1 5  downto  0)  :=  "1 000000000000000";  -  +/-  32K 
constant  BtestOS :  std_uIogic_vector(1 5  downto  0)  :=  ”1 1 1 1 1 1 1 1 1 1 1 1 1 1 11";  -  +/-  65534 


constant  add  :  std_ulogic  :=  'O'; 
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constant  sub :  std_ulogic 
component  mult_e 
port  (mult_reset 
mult_c!k 
mult_A_bus 
muIt_B_bus 
muitjgo 
mult_done 
mu[t_C_bus 
end  component; 


in  std_ulogic; 

in  std_ulogic; 

in  std_ulogic_vector(  1 5  downto  0); 

in  std_ulogic_vector(1 5  downto  0); 

in  8td_ulogic; 

out  std_uIogic; 

out  std_ulogic_vector(15  downto  0)); 


signal  sys_clk,sys„reset,  go,  done :  std_ulogic  :=  '0‘: 
signal  A,B,  result  :  std_ulogic_vector(15  downto  0); 


begin 

U1  :  mult_e 

PORT  MAP  (sys_reset, 
sys_clk, 

A, 

B, 
go. 
done, 
result); 


clock :  process 
begin 

sys_clk  <=  not(sys_clk); 
wait  for  10  ps; 
end  process  clock; 


^rcise :  process 

variable  InputA  inputs  :  std_ulogic_vector(1 5  downto  0); 

begin 


sys_reset  <=  ‘O'; 

forj  in  0  to  9  loop 
for  I  in  0  to  5  loop 
-pick  a  test 

CASE  j  IS 

WHEN  0  =>  inputA  -  AtestOO; 
WHEN  1  =>  inputA  :=  AtestOI ; 
WHEN  2  =>  inputA  :=  Atest02; 
WHEN  3  =>  inputA  :=  AtestOS; 
WHEN  4  =>  inputA  :=  Atest04; 
WHEN  5  =>  inputA  :=  AtestOS; 
WHEN  6  =>  inputA  :=  AtestOS; 
WHEN  7  =>  inputA  :=  Atest07; 
WHEN  8  =>  inputA  :*  AtestOS; 
WHEN  9  =>  inputA  :=  Atest09; 
END  CASE; 

CASE  I  IS 

WHEN  0  =>  inputs  :=  BtestOO; 
WHEN  1  =>  inputs  :=  BtestOI; 
WHEN  2  =>  inputs  ;=  Btest02; 
WHEN  3  =>  inputs  :=  BtestOS; 
WHEN  4  =>  inputs  :=  Btest04; 
WHEN  5  =>  inputs  :=  BtestOS; 
END  CASE; 
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wait  until  done  =  'O'; 


FOR  k  IN  0  TO  15  loop 

A(k)  <=  inputA(k); 

B(k)  <=  inputB(k); 

end  loop; 

wait  until  sys_clk'event  and  sys_clk='0'; 
go<=‘r; 

wait  until  done  =’T; 

go<='0'; 
end  loop; 
end  loop; 


wait  until  sys_clk’event  and  sys_clk=’0' 
wait  until  sys_clk'event  and  sys_clk='0' 
wait  until  sys_clk'event  and  sys_clk- 0’ 
wait  until  sys^clk'event  and  sys_clk=’0' 
wait  until  sys_clk'event  and  sys_clk='0' 
wait  until  sys_clk’event  and  sys_clk='0' 
wait  until  sys_clk’event  and  sys_clk='0' 
wait  until  sys_clk’event  and  sys_clk=’0’ 
wait  until  sys_clk'event  and  sys_clk=’0’ 
wait  until  sys_clk’event  and  sys_clk='0' 
wait  until  sys_clk'event  and  sys_clk=’0' 
wait  until  sysjclk'event  and  sys_c!k=’0' 
wait  until  sysjclk'event  and  sySjClk='0' 
wait  until  sysjclk'event  and  sys  jClk=’0' 
wait  until  sysjclk'event  and  sys ^.clk- O' 
wait  until  sys  jClk'event  and  sys  jClk='0' 
wait  until  sys  jClk'event  and  sys  jClk='0' 
wait  until  sys  jClk'event  and  sys_clk='0' 


ASSERT  false 

REPORT  "DONE" 


SEVERITY  failure; 
end  process  exercise; 
end  test; 


CONFIGURATION  mult_c  OF  mult ^tb  IS 
FOR  test 
FOR  ALL:  muft^e 

USE  ENTITY  WORK.multje(behavior); 
END  FOR; 

END  FOR; 

END  muItjC; 
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B.3.3  Multiplier  Results 
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B.4  Register  Unit 

B.4.1  Register  Model 


-  Project: 

Thesis 

-  Filename: 

reg_filej5kg.vhd 

-Other files  required: 

-Date: 

sept  23  97 

-  Entity/Architecture  Name: 

na 

-  Developer 

Steve  Parmley 

library  IEEE; 

use  IEEE.stdJogicJ164.all; 

package  regjilejskg  is 

subtype  addr  is  integer  range  31  downto  0; 

end  regLfile_pkg; 

-  Project: 

Thesis 

-  Filename: 

regjile.vhd 

-  Other  files  required: 

reg_filejDkg.vhd 

-Date: 

sept  23  97 

—  Entity/Architecture  Name: 

regLfile_e/behavior 

-  Developer 

Steve  Parml^ 

library  IEEE; 


use  lEEE.stdJogicJ164.all; 

use  WORK.reg_filej3kg.all; 

entity  regjile_e  is 

in 

port  (reg_file_reset 

regji!e_clk 

in 

regJile_C_bus 

in 

regLfile_C_regJatch 

in 

reg_file_C_reg_addr 

in 

reg_file_A_bus 

out 

reg  file  A  reg  addr 

in 

regLfile_BJ)us 

out 

regLfite„B_regLaddr 

in 

end  reg  Jile__e; 

architecture  behavior  of  reg_Jile_e  Is 
begin 


registers:  process 

subtype  reg  is  std_ulogic_vector(15  downto  0); 
type  bank  is  array(31  downto  0)  of  reg; 
variable  regs :  bank; 
begin 


std_ulogic; 

std_ulogic; 

std_uIogic_vector(15  downto  0); 

std_ulogic; 

addr; 

std_uloglc_vector(15  downto  0); 
addr; 

std__ulogic_yector(15  downto  0); 
addr); 


if  reg_file_reset  =  '1'  then 

for  index  in  31  downto  2  loop 

regs(!ndex)  :=  "OOOCXDOOOOOOOOOOO”; 


end  loop; 
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-  force  reg  0  and  1  to  zero  and  one  values 
regs(O)  :=  ''(X)00000000000000"; 
regs(1)  :=  "(XXXXXX)!  00000000"; 

end  if; 

wait  until  (reg_file_clk'event  and  regLf»le_clk='1‘); 

-  take  care  of  write  function  first 
ifreg_file  C  regjalch  =  ’1'then 

*»fTregJile_C_reg_addr  =  0)  or  (regJile_C_reg„addr  =  1)  then 
-  can  not  write  to  the  zero  and  1  registers 
else 

regs(reg_file_C_reg_addr)  :=  reg_file_C_bus; 

end  if; 

end  if; 

-  now  do  A  bus 

regufileJLbus  <=  regs(reg_fileJV_reg_addr); 

-  now  do  B  bus 

reoJile^B^bus  <=  regs(regJile_B_reg_addr); 

end  process  registers; 
end  behavior; 


B.4.2  Register  Testbench 


-  Project: 

Thesis 

-  Filename: 

regjie-bench.vhd 

-  Other  files  required: 

reg_filej3kg.vhd,  reg  Jle.vhd 

“Date: 

sept  23  97 

“  Entity/Architecture  Name: 

regLfileJbAest 

-Developer 

Steve  Parmley 

library  IEEE; 

use  IEEE.stdJogic_1164.all; 

use  WORK.regJlej3kg.all; 

entity  regLfilejtb  is 

end  regjlejtb; 

architecture  test  of  regjlejb  is 

component  reg_file_e 

port  (regufile^reset 

in 

std_ulogic; 

regjle_clk 

In 

std_uIogic; 

regJIejCJus 

in 

std_ulogic_vector(15  downto  0); 

regLfile_C„regJatch 

in 

std_ulogic; 

regfile  C  regi  addr 

in 

addr; 

reg  file  A  bus 

out 

std_ulogicj/ector(15  downto  0); 

req  file  A  reg__addr 

in 

addr; 

reoMRle  B  bus 

out 

std_ulogicj/ector(15  downto  0); 

reg_file_B_reg_addr 

in 

addr); 

end  component; 

signal  sys^resel,  sys^clk :  std_ulogic  :=  'O’; 
signal  bus_C,  bus_A,  bus_B :  std_ulogic_vector{15  downtoO); 
signal  reguaddr^A,  reg_addr_B,  reg_addr_C :  addr; 
signal  regJatch_C :  std_ulogic; 
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begin 

U1 :  reg_fi!e„e 
PORT  MAP 


(sys_reset, 

sys^clk, 

bus__C, 

reg_latch_C, 

reg_addr_C, 

bus_A, 

reg__addrj\, 

bus_B, 

reg_addr_B); 


clcx:k :  process 
begin 

sys_clk  <=  not(sys_clk); 
wait  for  10  ps; 
end  process  dock; 

rst :  process 
begin 

sys_resel  <=  '1'; 
wait  for  5  ps; 
sys^reset  <=  'O'; 
wait  for  15000  ps; 
end  process  rst; 

exercise :  process 
begin 


regLlatch_C  <=  'O'; 

regLaddr_A<=15; 
reg_addr_B  <=  15; 
reg_addr_C  <=  0; 

wait  until  sys^clk'event  and  sys^clk  ='0’; 

-  verify  that  all  regs  are  clear  (except  for  zero  regs  0  and  1 ) 
for  i  in  31  downto  0  loop 

reg_addr_A  <=  I; 

-  get  B  in  reverse  order  to  show  dual  bus  works 

reguaddr_B  <=  31 -i; 

wait  until  sys_clk'event  and  sys_clk  “'O’; 

end  loop; 

reg_addr_A<=  15; 
reguaddr^B  <==  15; 
regLaddr_C<=  15; 

wait  until  sys_clk'event  and  sys_clk  ='0'; 
wait  until  sys^clk’event  and  sys__clk  ='0'; 
wait  until  sys_clk'event  and  sys_dk  ='0’; 

-  write  some  Info  to  the  regs 
reg^addr  C  <=  0; 

bus„C  <*  “0100000000000001"; 
wait  until  sys^clk’event  and  sysjclk  ='0'; 
reg_latch_C  <=  '1*; 

wait  until  sys^clk’event  and  sys_clk  ='0'; 
regJatch_C  <=  'O’; 


reg  addr_C  <=  1; 

bus“c  <=  ''0100000000000010’’; 

wait  until  sys_dk’event  and  sys_clk  ='0'; 

regJatch_C  <=  '1'; 


FPGA  Processor  Implementation  for  the  Forward  Kinematics  of  the  UMDH  AP P  B-3  5 


wait  until  sys^clk’event  and  sys„clk  ='0'; 
regJatch_C  <=  'O'; 


f^^acldr^C  <=  2; 
bus_C  <=  ”0100000000000011"; 
walTuntil  sys_c1k'event  and  sys_clk  =’0'; 
regJatch_C  <=  '1'; 

wait  until  sys_clk'ewent  and  sys_clk  -  O'; 
regJatch_C  <=  'O'; 

regLaddr__C  <=  3; 
bus_C  <=  "0100000000000100"; 
wait  until  sys_clk'event  and  sys_clk  ='0'; 
reg_latch_C  <=  'V; 

wait  until  sys_clk’event  and  sys_clk  ='0'; 
regJatch_C  <=  'O'; 

regLaddr^C  <=  4; 
bus_C  <=  "0100000000000101"; 
wait  until  sys_clk'event  and  sys_clk  ='0'; 
regJatch_C  <=  '1'; 

waiTuntil  sys_clk'event  and  sys_clk  ='0'; 
regLlatch_C  <=  'O’; 

reg__addr_C  <=  5; 

bus^C  <=  "0100000000000110"; 

wait  until  sys^clk'event  and  sys^clk  =’0'; 

regLlatch_C  <=  '1'; 

wait  until  sys_clk'event  and  sys_clk  ='0'; 
regJatch_C  <=  'O'; 

reg_addr_C  <=  6; 

bus^C  <=  "01000000000001 11"; 

wait  until  sysjclk'event  and  sys_clk  =’0'; 

regJatch_C  <=  '1'; 

wait  until  sysjclk’event  and  sys ^clk  =’0’; 
regjatch jC  <=  'O'; 

regLaddfjC  <=  7; 

buSjC  <=  "0100000000001000"; 

wait  until  sySjClk'event  and  sys_clk  ='0'; 
reg^latcbjC  <=  '1'; 

wait  until  sys  jClk'event  and  sys ^clk  ='0’; 
reg JatchjC  <=  'O'; 


reg ^addr^C  <=  8; 

bus_C  <=  "0100000000001001"; 

wait  until  sysjclk’event  and  sysjclk  ='0'; 

reg^latcbjC  <=  '1'; 

wait  until  sys ^clk’event  and  $ys_clk  ='0'; 
regLiatch jC  <=  'O'; 

reg^addr  C<-9; 

buSjC  <=  "0100000000001010"; 

wait  until  sys  jClk’event  and  sysjclk  ='0'; 
regJatchjC  <=  '1'; 

wait  until  sys  jClk'event  and  sys_clk  =’0'; 
regjatch jC  <=  'O'; 


reg  addr  C  <=  10; 

buSjC  <=  "0100000000001011"; 

waiTuntil  sys  jClk'event  and  sysjclk 
regJatchjC  <=  '1'; 
wait  until  sys jClk'event  and  sysjclk 
reg Jatch jC  <=  'O'; 


='0'; 

=’0’; 
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reg_acldr  C<=  11; 
bus_C  <=  "0100000000CX)1 100": 
wait  until  sys_cik'event  and  sys_clk  ='0'; 
regJatch_C  <=  ’1'; 

wait  until  sys^clk'event  and  sys_clk  -  O’; 
regulalch^C  <=  ’O’; 

regLaddr_C  <=  12; 
bus„C  <=  "0100000000001101"; 
wait  until  sys_clk’event  and  sys_clk  ='0'; 
regJatch_C  <=  ’1'; 

wait  until  sys_clk’event  and  sys_clk  ='0'; 
regLlatch_C  <=  ‘O'; 

reg_addr_C  <=  13; 

bus_C  <=  "0100000000001 1 10"; 

wait  until  sys^clk'event  and  sys_clk  ='0'; 

regJatch^C  <=  '1'; 

wait  until  sys__clk'event  and  sys_clk  =’0'; 
regJatch_C  <=  'O'; 

reguaddr^C  <=  14; 

bus^C  <=  "01 00000000001  111"; 

wait  until  sys^clk’event  and  sys_clk  ='0’; 

regJatch_C  <=  '1'; 

wait  until  sys_clk'event  and  sys^clk  ='0'; 
reg_latch_C  <=  'O'; 


reg__addr_C  <=  15; 
bus^C  <=  "0100000000010000"; 
wait  until  sys^clk'event  and  sys_clk  -  O'; 
reg_latch_C  <=  '1'; 

wait  until  sys^clk'event  and  sySjclk  ='0'; 
regJatch^C  <=  'O'; 

reg_addr__C  <=  16; 
bus^C  <=  "1000000000000001"; 
wait  until  sys_clk'event  and  sys_clk  ='0'; 
negJatch_C  <=  '1'; 

wait  until  sys__clk'event  and  sys_clk  ='0'; 
regJatch_C  <=  'O'; 

reg_addr_C  <=  17; 
bus_C  <=  "lOOCKDOOOOOOOOOlO"; 
wait  until  sys_clk'event  and  sys_clk  ='0’; 
regJatch_C  <=  '1'; 

wait  until  sys_clk'event  and  sys_clk  -'O'; 
reg_latch_C  <=  'O'; 


reg  addr  C  <=  18; 
buslc  <=  "1000000000000011"; 
wait  until  sys_c!k'event  and  sys_clk  ='0'; 
reg_latch_C  <=  '1'; 

wait  until  sys_clk'event  and  sys_clk  ='0'; 
regL.latch_C  <=  'O’; 

regL_addr_C  <=19; 
bus_C  <=  "1000000000000100"; 
wait  until  sys^clk'event  and  sys_clk  ='0'; 
regJatch^C  <=  '1'; 

waiTunti!  sys_clk'event  and  sys^clk  ='0'; 
regL.latch_C  <=  'O'; 


reg_addr_C  <=  20; 
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bus^C  <=  "1000000000(XX)10r'; 
wafTuntil  sys_clk’event  and  sys_clk  =’0'; 
regJatch_C  <=  'V; 

watt  until  sys_clk'event  and  sys^clk  -  O’; 
regJatch_C  <=  ’O'; 

reg_addr  C<-21; 
bus_C  <=  ''1000000000000110''; 
wait  until  sys_clk'event  and  sys_clk  =’0’; 
regJatch^C  <=  '1'; 

waiTuntil  sys^clk'event  and  sys^clk  ='0'; 
regJatch_C  <=  'O’; 

rea_addr_C  <=  22; 
bus_C  <=  "10000000000001 11"; 
wait  until  sys_clk'event  and  sys_clk  -  O’; 
regJatch_C  <=  ’1’; 

wait  until  sys^clk'event  and  sys^clk  -  O'; 
regJatch_C  <=  'O’; 

regLaddr_C  <=  23; 
bus_C  <=  "1000000000001000"; 
wait  until  sys^clk’ewsnt  and  sys_clk -  O'; 
regJatch_C  <=  'T; 

wait  until  sys_c!k’event  and  sys_clk  ='0'; 
regJatch_C  <=  'O’; 


regLaddr_C  <=  24; 
bus„C  <=  "1000000000001001"; 
wait  until  sys_clk’event  and  sys_clk  -  O’; 
regJatch^C  <=  '1'; 

wait  until  sys_clk’event  and  sys^clk  -  O’; 
reg_latch_C  <=  'O’; 

regLaddr_C  <=  25; 
bus^C  <=  "1000000000001010"; 
wait  until  sys_clk'event  and  sys_clk  ='0'; 
reg_latch_C  <=  ’1’; 

wart  until  sys_c!k’event  and  sys_clk  -'O’; 
regJatch_C  <=  'O’; 

regLaddr^C  <=  26; 
bus_C  <=  ’’1 000000000001 011"; 
wait  until  sys^clk'event  and  sys_clk  ='0'; 
regJatch^C  <=  '1'; 

wartuntil  sys.clk'event  and  sys^clk  =’0’; 
regLlatch^C  <=  'O’; 

peg_addL.C  <=  27; 

bus_C  <=  "1000000000001 100"; 

wart  until  sys^clk'event  and  sys_clk  -  O'; 

regJatch_C  <=  'T; 

wart  until  sys_clk'event  and  sys_clk  ='0’; 
regJatch_C  <=  ’O'; 

reguaddr_C  <=  28; 

bus_C  <-  "1000000000001 101"; 

wait  until  sysjclk'event  and  sys_clk  ='0’; 

reg_latch_C  <=  '1'; 

wait  until  sys_clk'event  and  sys_clk  -  O'; 
reg Jatch_C  <=  'O'; 

regLaddr_C  <=  29; 

busjC  <=  "1000000000001110"; 

wart  until  sys_clk’event  and  sys_clk  -’O’; 
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regJatch_C  <=  ’1'; 

wait  until  sys_clk’©vent  and  sys_clk  ='0'; 
regJatch_C  <=  'O’; 

regLacldr^C  <=  30; 
bus_C  <=  ”1 000000000001  111"; 
wait  until  sys_clk'event  and  sys_clk  -  O'; 
regLlatch_C  <=  '1'; 

wait  until  sys_clk'event  and  sys_clk  ='0’; 
regJatch_C  <=  'O’; 


reg__addr_C<=31; 

bus^C  <=  "1000000000010000"; 

wait  until  sys^clk'event  and  sys_c!k 

regLlatch_C  <-  '1'; 

wait  until  sys_clk’event  and  sys_clk 

regJatch_C  <=  'O'; 


-O'; 

-O'; 


reg_addr_C  <=  15; 

waiTuntil  sys^clk'event  and  sys_clk  ='0*; 
wait  until  sys_clk’event  and  sys_clk  -  O'; 
wait  until  sys_clk'event  and  sys_clk  -  O’; 

-  verify  that  all  regs  are  correct  (©ccept  for  zero  regs  5  and  6) 
for  i  in  31  downto  0  loop 

r€g_addr_A  <=  i; 

regLaddr_B  <=  31 -i; 

wait  until  sys_clk’event  and  sys^clk  -  O'; 

end  loop; 

wait  until  sys_clk'event  and  sys_clk  ='1'; 

ASSERT  false 

REPORT  "DONE" 

SEVERITY  failure; 

end  process  exercise; 

end  test; 

CONFIGURATION  reg_file_c  OF  reg Jile^tb  IS 
FOR  test 

FOR  ALL:  reg  Jile^e 

USE  ENTITY  WORK.reg_„file_.e(behavior); 

END  FOR; 

END  FOR; 

END  reg_file_c; 
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B,5,l  Latch  Model 


-  Project: 

-  Filename: 

-  Other  files  required: 

-Date: 

-  Entity/A_busrchitecture  Name: 
-Developer 

-  Function: 

-  Limitations: 

-  History: 

-Last  Analyzed  On: 


Thesis 

latch.vhd 

Oct  17  97 
tatch_e/behavjor 
Steve  Parmley 


library  IEEE; 

use  IEEE.stdJogicJ164.all; 

entity  latch_e  is 
port  (lalch^en 

latch_A_bus 
latch_0_bus 
end  latch_e; 

architecture  behavior  of  latch_e  is 
begin 

latch :  process  (latch_en,  latch_A_bus) 
begin 

if  latch_en  =  'V  then 

latch_0_bus  <-  latch_A_bus; 

end  if; 

end  process  latch; 
end  behavior; 


in  std^ulogic; 

in  std_ulogic_veclor(1 5  downto  0); 

out  std__ulogic_vector(1 5  downto  0)); 


B.5.2  Latch  Testbench 


-  Project: 

Thesis 

-  Filename: 

mux4J-bench.vhd 

-  Other  files  required: 

-Date: 

Oct  17  97 

-  Entity/Architecture  Name: 

niux4J_tb/test 

-Developer: 

Steve  Parmley 

-  Function: 

—  Limitations: 

-  History: 

-  Last  Analyzed  On: 

library  IEEE; 

use  IEEE.std_logic_1164.all: 


entity  latch  Jb  is 
end  latch  Jb; 
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architecture  test  of  latch^tb  is 

constant  AtestO  i  stei^ulogic_vector(15  downtoO)  "00000CKX)00000000  , 
constant  Atesti  : std_ulogic_vector(  15 downtoO) :=  "0101010101010101"; 
constant  Atest2  :  std_ulogic_vector(1 5  downto  0)  :=  "1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 
constant  AtestS  :  std_ulogic_vector(1 6  downto  0)  :=  "1 01 01 01 01 01 01 01 0"; 


component  latch_e 

port  (latch  en  in  std_uloglc; 

latch_A_bus  in  std_ulogic_vector(1 5  downtoO); 

latchlo^bus  out  std_ulogic_vectorO  5  downtoO)); 

end  component; 

signal  en  :  std_ulogic  :=  'O’; 

signal  A,0  :  std_ulogic_vector(16  downto  0); 

begin 

U1  :  latch_e 
PORT  MAP  (en, 

A, 

O); 


exercise :  process 
begin 

wait  for  5  ps; 

ForjinOtoSloop 
OASE  j  is 

WHEN  0=>  A  <=  AtestO 
WHEN  1  =>  A  <=  Atesti 
WHEN  2  =>  A  <=  Atest2 
WHEN3=>A<=Alest3 
end  CASE; 

wait  for  5  ps; 

en  <=  'T; 

wait  for  5  ps; 

en  <=  'O’; 

wait  for  20  ps; 


end  loop; 


ASSERT  false 

REPORT  "DONE" 
SEVERITY  failure; 
end  process  ©cercise; 
end  test; 

CONFIGURATION  latch_c  OF  latch Jb  IS 
FOR  test 
FORALL;latch_e 

USE  ENTITY  WORK.latch_e(behaN/ior); 
END  FOR; 

END  FOR; 

END  latch^c; 
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B,5.3  Latch  Results 
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B,6  Multiplexor 

B.6.1  Multiplexor  Model 


-  Project: 

-  Filename: 

-  Other  files  required: 

“Date: 

~  Entity/A_busrchitecture  Name: 
“Developer 
“  Function: 

~  Limitations: 

“  History: 

“Last  Analyzed  On: 


Thesis 
mux4_1  .vhd 

Oct  17  97 
mux4_1_e/behavior 
Steve  Parmley 


library  IEEE; 

use  IEEE.stdJogicJ164.all; 

entity  mux4J_e  is 
port  (mux_clk 

muxjel 
mux_A_bus 
mux_B_-bus 
•Tiux_C_bus 
mux_D_bus 
muxJD_bus 
end  mux4J_e; 

architecture  behavior  of  mux4  J_e  Is 
begin 

mux :  process 
begin 

vvait  until  mux^clk'event  and  mux_c!k='1'; 
case  muxjel  is 

when  ”00’*  =>  mux_0„bus  <=  mux_jA_bus; 
when  ”01”  =>  mux_0_bus  <=  mux_B_bus; 
when  "10"  =>  mux_0_bus  <=  mux_C_bus; 
when  "11"  =>  mux_0_bus  <=  mux_D_bus; 
when  others  =>  mux_0_bus  <=  mux_A_bus; 
end  case; 
end  process  mux; 
end  behavior; 


B.6.2  Multiplexor  Testbench 


in  std_ulogic; 

in  std_ulogic_vector{1  downto  0); 

in  std jjlogic_vector(1 5  downto  0); 

in  std_ulogic_vector(1 5  downto  0); 

in  std_ulogicjector(16  downto  0); 

in  std_ulogic_vector(1 6  downto  0); 

out  std_ulogic_vector(15  downto  0)); 


“Project; 

Thesis 

“  Filename: 

mux4_1 -bench,  vhd 

“  Other  files  required; 

“Date: 

Oct  17  97 

“  Entity/Architecture  Name: 

mux4jjtb/test 

“Developer: 

Steve  Parmley 

“  Function: 

“  Limitations: 

“  History: 

-Last  Analyzed  On: 
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library  IEEE; 

use  IEEE.stdJogicJ164.all; 


entity  mux4J  Jb  is 
end  mux4J  Jtb; 


architecture  test  of  nnux4  J  Jb  is 


constant  AtestO  :  std_ulogic_yector(15  downto  0)  "OOOOOOOOOCXXXWOO" 
constant  BtestO  :  std_uiogic_vector(1 5  downto  0)  :=  "0101010101010101"; 
constant  CtestO  :  std__ulogic_\«ctor(1 5  downto  0):=  "1111111111111111" 
constant  DtestO  :  std_ulogic_vector(1 5  downto  0)  :=  "101 01 01 01 0101 01 0"; 
constant  Atesti  :  std_uIogic_vector{1 5  downto  0)  :=  "00001 111 00001 111" 
constant  Btesti  :  std_ulogic_vector(15  downto  0)  :=  ”1 1 1 100001 1 110000"; 
constant  Ctesti  :  std_ulogic_vector(1 5  downto  0)  :=  "1 1 001 1 001 1 001 1 00" 
constant  Dtesti  :  std_ulogic_vector(1 5  downto  0)  :=  "001 1001 100110011"; 

constant  A_sel :  std_ulogic_vector  :=  "00"; 
constant  B_se! :  std_ulogic_vector  :=  "01"; 
constant  C_sel :  std_uIogic_vector  :=  "10"; 
constant  D_sel :  std_ulogic_vector  :=  "1 1 


component  mux4J_e 
port  (mux_clk  in 

mux_se!  in 

niux_A_bus  in 

mux_B_bus  in 

mux_C_bus  in 

mux_D_bus  In 

mux_0_bus  out 

end  component; 


std_uIoglc; 

std_ulogic_vector(1  downto  0); 
std_uloglc_vector(15  downto  0); 
std_ulogic_vector(15  downto  0); 
std_uloglc_vector(15  downto  0); 
std_ulogic_vector(15  downto  0); 
std_ulogic_vector(15  downto  0)); 


signal  sel 
signal  A,B,C,D,0 
signal  sys_ctk 


:  std_uIogic_vector(1  downto  0)  :=  "11"; 
:  std_u!ogic_vector(15  downto  0); 

;  std_ulogic  :=  *0'; 


begin 

U1 :  miw4J  e 
PORT  MAP  (sys^clk. 

sel, 

A, 

B, 

C, 

D, 
O); 


clock :  process 
begin 

sys_clk  <~  not(sys^clk); 
wait  for  10  ps; 
end  process  clock; 


exercise :  process 
begin 

wait  for  20  ps; 

Forj  InOtol  loop 
CASE]  is 

WHEN  0=>  A  <=  AtestO; 

B  <=  BtestO; 
C  <-  CtestO; 
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D  <=  DtestO; 
WHEN1=>A<=Atest1; 
B<=Btest1; 
C<=Ctest1; 
D<=Dtest1; 

end  CASE; 


Foriin0to3lcx)p 
CASE  i  IS 

WHEN0=>  sel  <=A_sel; 
WHEN1=>sel<=B„sel; 
WHEN2=>sel<=C_sel; 
WHEN3=>  sel  <=D_sel; 
END  CASE; 

wait  until  sys_clk'event  and  sys^clk  =  '1'; 

end  loop; 

end  loop; 


ASSERT  false 

REPORT  "DONE" 
SEVERITY  failure; 
end  process  exercise; 
end  test; 

CONFIGURATION  mux4_1_c  OF  mux4_1_tb  IS 
FOR  test 

FOR  ALL:  mux4J_e 

USE  ENTITY  WORK.mux4Jje(behavior); 
END  FOR; 

END  FOR; 

END  mux4_1_c; 


B,63  Multiplexor  Results 
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B.7  FKP  Core 

B.7.1  FKP  Core  Model 


-  Prqect: 

-  Filename: 

-  Other  flies  required: 

-Date: 

-  Entity/Architecture  Name: 
-Developer 

-  Function: 

-  Limitations: 

-  History: 

-  Last  Analyzed  On: 

Thesis 

fkp  core  core.vhd 
all  FKP  files 

Oct  17  97 

fkp_core_e/behavior 
Steve  Parmley 

library  IEEE; 

use  IEEE.stdJoglcJ  164.all; 

use  WORK.reg_file_pkg-all: 

entity  fkp_core_e  is 


port  (fkp_core_clk 

in 

std^ulogic; 

fkp_core_reset 

in 

std_uIogic; 

fkp_core_dataJn 

in 

std_ulogic_vector(15  downto  0); 

fkpjcx)re_data_out 

out 

std_ulogic_vector(15  downto  0); 

fkp_core_dataJnJatch 

in 

std_ulogic; 

fkp_corejdata_outJatch 

in 

std_uiogic; 

fkp_core_c_regJatch 

in 

std__ulogic; 

fkp_core_c_reg_addr 

In 

addn 

fkp«.core_a_reg__addr 

in 

addn 

fkp_core__b_reg_addr 

in 

addr; 

fkp_core_cos_sin_ready 

out 

std_ulogic; 

fl<P«.core_cos_sin_go 

In 

std_ulogic; 

fk|3jcore_cos_sin_sel 

in 

std_ulogic; 

fkpjcore_cos_sin__wait 

in 

std_ulogic_vector(2  downto  0); 

fkp_core_rom_addr 

out 

std_ulogic_vector(12  downto  0); 

fkp_coreL.rom_data 

in 

std_ulogic_vector(15  downto  0); 

fkp_core_adderjgo 

in 

std_ulogic; 

fk|3_core_adder_sel 

in 

std_ulogic; 

fkp_core_adderjdone 

out 

std_ulogic; 

fkp_core_multjgo 

in 

std^ulogic; 

fkp_core_mult_done 

out 

std_ulogic: 

fkp_coreLmux_sel 

in 

std_ulogic_vector{1  downto  0)); 

end  fkp_core_e; 


architecture  structural  of  fkp_core_e  Is 
-SIGNALS 

signal  cos_slnJo_mux,  adder Jo_mux,  mult_to_mux,  datajn Jolftiux  :  std_ulogic_vector(1 5  downto  0); 
signal  mux_to_regs,  A„bus,  B_bus  :  std_ulogic_vector(15  downto  0); 


-COMPONENTS 
component  adder_e 
port  (adder_reset 

in 

std_uloglc; 

adder_clk 

in 

std_ulogic; 

adder_A_bus 

in 

std_ulogic_vector(15  downto  0); 

adder_B_bus 

in 

std_ulogic__vector(15  downto  0); 

adderjgo 

in 

std_ulogic; 

adder_se! 

in 

std_uloglc; 
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adder^done 
adder_C_bus 
end  component; 

component  mult_e 
port  (mult_reset 

mult^clk  : 
•Tnult_A_bus 
fTiult_B_bus 
multjgo 
murtjJone 
mult_C_bus 
end  component; 

component  cosjsin^e 
port  (cos_sln_reset : 
cos_sin_clk 
cos_s}n_A_bus 
cos_sin_go 
cos_sin_sel 
cos_sin_walt 
cos_sin_ready 
cos_sinjC_bus 


in 

in 


in 


out  std^ulogic; 

out  std_ulogic_vector(1 5  downto  0)): 


std^uloglc; 

std_ulogic; 

in  std_ulogic_vector(1 5  downto  0); 

in  std_ulogic_vector(1 5  downto  0); 

in  std_ulogic; 

out  std_ulogic; 

out  std_ulogic_vector(1 5  downto  0)); 


std_ulogic: 
in  std_uiogic; 

in  std_ulogic„vector(1 5  downto  0); 

in  std_ulogic; 

in  std^ulogic; 

in  std_ulogic_vector(2  downto  0); 

out  std_u!ogic; 

out  std_ulogic_vector(15  downto  0); 


-  the  following  describes  the  connection  to  the  rom 

cos^sin_rom__addr:  out  std_ulogic^v©ctor(1 2  downto  0); 

cosIsin_rom_data:  in  std_ulogic_vector(1 5  downto  0)); 

end  component; 


component  regjile_e 
port  (reg_file_reset 
regjile_clk 
reg_file_C_bus 
r'e9_/‘te-.C_regJatch 
reg_file_C_regL.acldr 
reQ-fiteJ^-bus 
regJWe J\_reg_addr 
reg_/ile_B_bus 
regJile^B^regLaddr 
end  component; 

component  latch_e 
port  (latch_en  ii 

latch_A_bus 
latch_0_bus 
end  component; 

component  mux4_1_e 
port  (mux_c!k 

mux_sel 
mux_A_bus 
mux_B_bus 
rtiux_C_bus 
mux_D_bus 
mux_0_bus 
end  component; 


in  std_ulogic; 

in  std_ulogic; 

in  std__ulogic_vector(1 5  downto  0); 

in  std^ulogic; 

in  addr; 

out  std_u!oglc_vector(1 5  downto  0); 

in  addr; 

out  std_ulogic_vector(1 5  downto  0); 

in  addr); 


std_ulogic; 

in  std_u!ogic_vector(1 5  downto  0); 

out  std_uIogic_vector(1 5  downto  0)) ; 


in 

in 

in 

in 

in 

In 

out 


std_ulogic; 

std_ulogic^ 

std_u!ogic_ 

std_ulogic^ 

std_ulogic. 

std_ulogic_ 

std_ulogic_ 


.vector(1  downto  0); 
vector(15  downto  0); 
vector(1 5  downto  0); 
vector(15  downto  0); 
vector{15  downto  0); 
vector(1 5  downto  0)); 


begin 

U_adder_1 :  adder_e 
TORT  MAP  (fkp_core_resel, 
l^_core_clk, 
A_bus, 

B_bus, 

fkp„core_adderjgo, 

fkp_core_adderjsel, 
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fkp_core_adder_clone, 
adder JCLmux); 


U_mult  1 :  mult_e 
PORT  MAP  (fkp_cx)re_reset, 
fkp_core_c!k, 

A_bus, 

B_bus, 

fkp_core_mult.done, 

mult_to_mux); 

U_cos_sin_1 :  cos_$in_e 
PORT  (fkp_core_reset, 
fkp_core_clk, 

A_bus, 

fkp_core_cos_sin_go, 

fkp_core_cos_sin_sel, 

fii_axe_cos_sinjwait, 

fkp_core_cos_sin_ready, 

cosjsin^tojnux, 

fkp3ore_ronri_addr, 

fkp_core_rom_data); 


U_reg_file_1 :  reg_fite_e 
POFTT  MAP  (fkp_core_reset, 
fkp_cx)re_clk, 
mux__tOLregs, 
fl^_core_c_regLlatch, 
fkp_core_c_reg_addr, 
A_bus, 

fkp-.core-a_re9-.acldr, 

B_bus, 

fkp_core_b_reg_addr); 

U  mux4  1_1  :  mux4_1_e 
PORT  MAP  (fkp„core_clk, 

fkp_core_mux_sel, 

cos_sin_to_mux, 

adder JCL.mux, 

niultJo_mux, 

dalajnjo^mux, 

muxJo_regs); 


UJatchJn :  latch_e 
POR’f MAP  (fkp_core_dataJnJatch, 
fkpjcorejdataJn, 
dataJnJo_mux); 

U Jatch^out :  latch_e 
PORT  MAP  {fkp_core_data_outJatch, 
B_bus, 

fkp_corejdata_out); 


end  structural; 
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BJ.2  FKP  Core  Testbench 


-  Project: 

Thesis 

~  Rlename: 

fkp_core*bench.vhd 

-  Other  files  required: 

fkp  core  vhd 

“Date: 

Oct  20  97 

“  Entity/Architecture  Name: 

fkp_core_tbrtest 

-Developer: 

Steve  Parmley 

library  IEEE; 

use  IEEE.stdJogicJ164.all; 
use  WORK.regJilej3kg.aIl; 


entity  fkp_coreJb  is 
end  fkp_coreJb; 

architecture  test  of  fkp_coreJb  is 


component  fkpj:ore_e 
port  (fkpjcore_clk 

fkpjcore^reset 
fkp_corejiataJn 
fli_core_data_out 
fkp^corejatajnjatch 
fkp_coreJata_outJatch 
fkp„core_c_regJatch 
fkp_oore_c_reg_addr 
fkp_core_a_reg_addr 
fkp_core_b_regLaddr 
fkp_core_cos„sin_ready 
fkp_core_cos_sinjgo 
fkp_core_cosjsin_sel 
fkp_core_cos_sinjwait 
fkj3_core_rom_addr 
fkP-Cor®_romJata 
fkp_core_adderjgo 
fkp_core_adder_sel 
fkf3_core_adderjjone 
fkp_core__multjgo 
fkp_core_mult_done 
fkp_core_mux_sel 
end  component; 


in 

std^ulogic; 

in 

std.ulogic; 

in 

std_ulogicjvector(15  downto  0); 

out 

std_uIogic_vector(15  downto  0); 

in 

std_ulogic; 

in 

std^ulogic; 

In 

std_ulogic; 

in 

addr; 

In 

addr; 

in 

addr; 

out 

std_ulogic; 

in 

std_ulogic; 

in 

std_ulogic; 

in 

std_uloglc_vector(2  downto  0); 

out 

std_ulogic_vector(12  downto  0); 

in 

std_ulogic_vector{15  downto  0); 

in 

std_ulogic; 

in 

std_ulogic; 

out 

std_ulogic; 

in 

std_uloglc; 

out 

std_ulogic; 

in 

std_ulogicj/ector(1  downto  0)); 

signal  sys_reset,  sys^clk :  std_ulogic  :=  'O'; 

signal  a_reg_addr,  b^reguaddr,  c_regLaddr :  addr, 

signal  datajn,  dataj)ut ;  std_ulogicj/ector(15  dowmto  0); 

signal  datajnjatch.  data-out Jatch,  cjegLiatch,  cos_sin_ready :  std^ulogic; 

signal  cos_sinjgo,  cos_sin_sel,  adderjgo,  adder^sel,  adderjone  :  std_ulogic; 

signal  muitjgo,  multjone  :  std^ulogic; 

signal  cos^sinjwait :  std_ulogic_vector(2  downto  0); 

signal  rom_addr :  std„ulogic_vector(12  downto  0); 

signal  romjata :  std_uloglc_vector(15  downto  0); 

signal  mux_sel  :  std_ulogic_vector(1  downto  0); 


type  opcode  is  (illegal,  movein,  moveout,  move,  cosine,  sine,  addition,  subtraction,  multiplication); 
signal  instruction :  opcode; 


begin 

U1 :  fkp  core_e 
PORfWAP  (sys_clk. 
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sys_reset, 

datajn, 

data_out, 

datajn Jatch, 

datSLOUtJatch, 

c^resLlatch, 

c_regLaddr, 

a_reg_addr, 

b_reg.addr, 

cos_sin_ready, 

cos-Sinjgo, 

cos__sin_sel, 

cos_slnj/vait, 

rom_addr, 

rom_data, 

adderjgo, 

adder^sel, 

adder jione, 

muitjgo, 

mult_done, 

mux_sei); 


clock :  process 
begin 

sys_clk  <=  not(sys_clk); 
wait  for  10  ps; 
end  process  clock; 

rst :  process 
begin 

sys_reset  <= '1'; 
wait  for  40  ps; 

sys_reset  <=  'O'; 
wait  for  50000  ps; 
end  process  rst; 

exercise :  process 
begin 

-quick  test 

instruction  <=  illegal; 
datajnjatch  <=  'O'; 
data_outJatch  <=  'O'; 
c_regjatch  <=  'O'; 
cos^sinjgo  <=  'O'; 
cos_sinjwait  <=  "1 11"; 
adderjgo  <=  *0'; 
muItjgo  <=  'O’; 
a^reOLaddr  <=  15; 
b_reg_addr  <=  15; 
c_reg_addr<=  15; 
mux_sel  <=  "00"; 
wait  for  60  ps; 

wait  until  sys_clk’event  and  sys_clk='1'; 

-  MOVE  IN 

Instruction  <=  mowein; 

datajn  <=  "0000000000000101"; 

wait  until  sys_clk'event  and  sys_clk='1’; 

mux_sel  <="11"; 

c_regLaddr  <=  2; 

datajnjatch  <=  *T; 

wait  until  sys_clk'event  and  sys_clk- 1'; 
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datajnjatch  <=  'O’; 
cjeoJsXch  <-  *1'; 

wait  until  sys^clk’event  and  sys_clk='1’; 

c_reg_latch  <=  'O'; 

END  MOVE  IN 

MOVE  OUT 

instruction  <=  moveout; 
b_reg_addr<=2; 

wait  until  sys_clk'event  and  sys_clk='T; 

data_outJatch  <=  'T; 

wait  until  sys_clk'event  and  sys_clk='r; 

data_outJatch  <=  'O’; 

END  MOVE  OUT 

MOVE  IN 

instruction  <=  movein; 

datajn  <=  "0000000001001011"; 

wait  until  sys_dk'event  and  sys_cik='T; 

mux_sel<:="11": 

c_reg_addr  <=  3; 

datajnjatch  <=  '1'; 

wait  until  sys_clk'event  and  sys_clk='T; 

datajnjatch  <=  'O'; 
c_reg_!atch  <=  '1'; 

wait  until  sys_clk’event  and  8ys_clk='1'; 

c_regjatch  <=  'O'; 

-  END  MOVE  IN 

-MOVEOUT 

instruction  <=  moveout; 
b_reg^addr  <=  3; 

wait  until  sysjclk'event  and  sysjclk- T; 

data_outJatch  <=  *T; 

wait  until  sys_c!k'event  and  sys_c!k=*1'; 

data_outJatch  <=  'O'; 

-  END  MOVE  OUT 


ADD 

instruction  <=  addition; 

a_reguaddr  <=  2; 

b_reg_addr  <=  3; 

c_reg_addr  <=  10; 

adder_se!  <=’0'; 

mux_sel  <="01"; 

wait  until  sys^clk'event  and  sys_clk='1'; 

adderjgo  <=  'T; 

wait  until  adder__done  =  'T; 


adderjgo  <=  *0'; 
c_reg_latch  <=  '1'; 

wait  until  sys_c!k'event  and  sys_clk='1'; 


c__reg Jatch  <=  'O'; 
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-END  ADD 


-  MOVE  OUT 

instruction  <=  moweout; 
b_reg_addr<=10; 

until  sys_clk’event  and  sys_clk=’1'; 

dat^outjatch  <=  ‘1'; 

wait  until  sys^clk'event  and  sys_c!k='r; 

data  outjatch  <= 'O’; 

-  END  MOVE  OUT 

-MOVE 

instruction  <=  move; 

a_reg_addr  <=  0; 

b_reg_addr  <=  10; 

c_regLaddr  <=11; 

adder_sel  <='0'; 

mux^sel  <:="01’'; 

wait  until  sys_clk'event  and  sys_clk='1'; 

adderjgo  <=  '1'; 

wait  until  adder_done  *  'T; 


adder jgo  <=  'O’; 
c_regjatch  <=  'T; 

vvait  until  sys_clk'event  and  sys_clk='T; 

c_regLlatch  <=  *0’; 

-  END  MOVE 


for  I  in  0  to  3  loop 
-SUB 

instruction  <=  subtraction; 

a_reg_addr  <=  11; 

b_regL.addr<=  1; 

c_reg_addr<=11; 

adder^se!  <=‘T; 

mux_sel  <="01"; 

wait  until  sys^clk’event  and  sys_clk=’T; 

adderjgo  <=  'T; 

wait  until  adder_done  =  'T; 


adderjgo  <=  'O'; 
c_regLlatch  <=  '1'; 

wart  until  sys_clk’event  and  sys_clk=’1’; 

CLtegJatch  <=  'O’; 

-END  ADD 


-MOVE  OUT 

instruction  <=  moveout; 
b_reg_addr<=  11; 

wait  until  sys^clk’event  and  sys_clk=’1'; 

data_outJatch  <=  'T; 

wait  until  sys_clk'event  and  sys_clk='T; 
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-  END  MOVE  OUT 


end  ioop; 

-  Multiply 

instaiction  <=  multiplicalion: 

a_reg_addr  <=  2; 

b_reg_addr  <=  3; 

c_regL.addr  <=  31; 

mux.sel  <="10"; 

wait  until  sys_clk'event  and  sys_clk=’T; 

muitjgo  <='1'; 

wait  until  mult_done  =  '1‘; 

muItjgo  <=  'O’; 
c_regjatch  <=  'T; 

wait  until  sys^clk'event  and  sys_clk='1’; 

c_regjatch  <=  'O'; 

-END  ADD 


for  I  In  0  to  31  loop 
--  MOVE  OUT 

instruction  <=  moveout; 
b^recLaddr  <=  i; 

wait  until  sys_clk'event  and  sys_clk- T; 

data-out Jatch  <=  '1'; 

wait  until  sys_clk’event  and  sys_clk- 1'; 

data_outJatch  <=  *0'; 

-  END  MOVE  OUT 
end  loop; 


-COSINE 

instruction  <=  cosine; 
cos_sln_sel  <=  'O’; 
a_reg_addr  <=  2; 
mux_sel  <=  "00"; 
c_reg_addr  <=  15; 

wait  until  sys_clk'event  and  sys_clk- T; 

cos_sinjgo  <=  *1*; 

wait  until  cos_sin_ready=’1'; 

cos_sin jgo  <=  'O'; 
c^regjatch  <=  'T; 

wait  until  sys_cik'event  and  sys_clk='T; 
c_regjatch  <=  'O'; 


-MOVEOUT 

Instruction  <=  moveout; 
b^reguaddr  <=  15; 

wait  until  sys^clk'event  and  sys^clk='1'; 

data_outJatch  <=  'T; 

wait  until  sys_clk’event  and  sys^clkss'l'; 
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-  END  MOVE  OUT 
-SINE 

instruction  <=  sine; 
cosjsin_sel  <=  *1'; 
a_reg_addr<=3; 
nfiux_sel  <=  "CX)"; 
c_reg_acldr  <“  16; 

wait  until  sys_c!k'event  and  sys_clk=’1'; 

cos_sinjgo  <=  'T; 

wait  until  cos_sin_ready='1'; 

cos_sinjgo  <=  'O'; 
c_regjalch  <=  'T; 

wart  until  sys_clk'event  and  sys^clk- 1’; 
c^regjatch  <=  'O’; 


-  MOVE  OUT 

instruction  <*  moveout; 
b_reg_addr<=  16; 

wait  until  sys_clk'event  and  sys__clk=’1'; 

data^outjatch  <=  'T; 

wait  until  sys_clk*event  and  sys_clk=’1'; 

data  outjatch  <=  'O'; 

-  END  MOW  OUT 

wait  until  sys_clk'event  and  sys_clk='1'; 
wait  until  sys_clk'event  and  sys_clk='T; 
wait  until  sys^clk'event  and  sys_clk- T; 

ASSERT  false 

REPORT  "DONE" 

SEVERITY  failure; 

end  process  exercise; 

rom :  process 
begin 

wait  until  rom^addr'event; 

-  make  up  sonrre  rom  data  (inverse  of  the  address  for  now) 
rom_data(12  downto  0)  <=  not(rom_addr(12  downto  0)); 

-  fill  in  the  rest 

rom_data{15  downto  13)  <=  "000"; 
end  process  rom; 

end  test; 

CONFIGURATION  fkp_core_c  OF  fkp_core_tb  IS 
FOR  test 

FOR  ALL:  fkp_core_e 

USE  ENTITY  WORK.fkp_core_e(structural); 

END  FOR; 

END  FOR; 

END  fkp_core_c; 
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B.5  Microstrore 


B.5.1  Microstrore  Model 


-Project: 

Thesis 

-  Filename: 

microstore_head.vhd 

-Other files  required: 

-Date: 

Oct  31  97 

-  Entity/Architecture  Name: 

n/a 

-Devi^oper 

Steve  Parmley 

library  IEEE; 

use  lEEE.stdJogicJ  164.all; 
use  WORK.reg_fiIej3kg.all; 


Package  MICROSTORE  Is 


procedure  movejn  (SIGNAL  reg:  in  addr; 

SIGNAL  sys_clk:  in  std_uloglc; 

SIGNAL  muoLsel:  out  std_ulogic_vector(1  downto  0); 
SIGNAL  c_reg_addr  out  addr; 

SIGNAL  datajnjatch;  out  std_ulogic; 

SIGNAL  c_regjatch:  out  std^ulogic); 


procedure  move_out  (SIGNAL  reg:  in  addr; 

SIGNAL  sys_clk:  in  std_ulogic; 

SIGNAL  b_r0g_addr  out  addr; 

SIGNAL  data_outJatch:  out  std_ulogic); 

procedure  add  (SIGNAL  reg1 ,  reg2.  regS:  in  addr; 

SIGNAL  sys_clk:  in  std_ulogic; 

SIGNAL  adder_done:  in  std_ulogic; 

SIGNAL  a_ieg_addr,  b_reg_addr,  c_reg_addr  out  addr; 
SIGNAL  adder_sel:  out  std_ulogic; 

SIGNAL  muxjsel:  out  std_ulogic_vector(1  downto  0); 
SIGNAL  adder_go:  out  std_ulogic; 

SIGNAL  c_regjatch:  out  std_ulogic); 

procedure  sub  (SIGNAL  reg1 ,  reg2,  reg3:  in  addr; 

SIGNAL  sys_clk:  in  std_ulogic; 

SIGNAL  adder_done:  In  std_ulogic; 

SIGNAL  a_reg_addr,  b_reg_addr,  c_reg_addr;  out  addr; 
SIGNAL  adder_sel:  out  std_ulogic; 

SIGNAL  mux_sel:  out  std_ulogic_vector(1  downto  0); 
SIGNAL  adderjgo:  out  std_ulogic; 

SIGNAL  c_reg_lalch:  out  std_ulogic); 


procedure  mult  (SIGNAL  reg1 ,  reg2,  reg3:  in  addr; 

SIGNAL  sysjclk:  in  std_ulogic; 

SIGNAL  mult_done:  in  std_uIogic; 

SIGNAL  a_reg_addr,  b_reg_addr,  c_reg_addr:  out  addr; 
SIGNAL  mux_sel:  out  std_ulogic_vector(1  downto  0); 
SIGNAL  mult jgo;  out  std_ulogic; 

SIGNAL  c_regLlatch:  out  std_ulogic); 


procedure  cos  (SIGNAL  reg1 ,  reg2:in  addr; 

SIGNAL  sys_clk:  in  std_ulogic; 
SIGNAL  cos_sin_ready:  in  std_ulogic; 
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SIGNAL  cosj5in_sel:  exit  std_ulogic; 

SIGNAL  a_reg_addr,  c^regLaddr:  out  addr; 

SIGNAL rmK.se!:  out std_ulogic_vector(1  downto 0); 
SIGNAL  cos_sinjgo:  out  std_ulogic; 

SIGNAL  c_regjalch:  out  std^ulogic); 

procedure  sin  (SIGNAL  reg1 ,  reg2:in  addr, 

SIGNAL  sys_clk:  in  std_ulogic; 

SIGNAL  cos_sin_ready;  in  std_ulogic; 

SIGNAL  cos_sin_sel:  out  std_ulogic; 

SIGNAL  a_reg_addr,  c_reg_addr:  out  addr, 

SIGNAL  mux_sel:  out  std_ulogic_veclor(1  downto  0); 
SIGNAL  cosjsinjgo:  out  std_uIogic; 

SIGNAL  c_reg^latch:  out  std_ulogic); 

end  MICROSTORE; 


-  Project: 

Thesis 

-  Filename: 

microstore,  vhd 

-Other files  required: 

-Date: 

Octal  97 

-  Entity/Architecture  Name: 

n/a 

-  Developer 

Steve  Parmley 

library  IEEE; 

use  IEEE.stdJogicJ164.all; 
use  WORK.regJile_pkg.all: 


Package  body  MICROSTORE  is 

-  MOVEJN(reg)  assurne  that  data  is  present  on  input  of  latch 
procedure  movejn  (SIGNAL  reg:  In  addr; 

SIGNAL  sys_clk:  in  std_ulogic; 

SIGNAL  mux_sel:  out  std_ulogic_vector(1  downto  0); 
SIGNAL  c_regLaddn  out  addr; 

SIGNAL  datajnjatch:  out  std_uloglc; 

SIGNAL  c_regjatch:  out  std_ulogic)  Is 


begin 

-  set  mux  to  allow  data  in  latch  to  reg 
mux_sel<="11”; 

-  set  up  register  to  write  to 
c_reg_addr  <-  reg; 

-  latch  the  data  already  present  on  the  input  of  the  latch 
datajnjatch  <=  *T; 

wait  until  sys_clk'event  and  sysjclk- 1‘; 

-  hold  latched  value 
datajnjatch  <=  'O'; 

-  and  copy  it  into  register  file 
cjegjatch  <-  'T; 

wait  until  sys_clk'event  and  sysjcIk='T; 

-  hold  it  in  register  file 
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c_regjatch  <=  '0*; 
end  movejn; 


MOVE^OUT  (reg) 

procedure  move^out  (SIGNAL  reg:  in  addr; 

SIGNAL  sys_clk:  in  std_ulogic; 

SIGNAL  b_reg_addr:  out  addr; 

SIGNAL  data_outJatch:  out  std^ulogic)  is 

begin 

-  set  up  register  to  write  to 
b_reg_addr  <=  reg; 

wait  until  sys_clk’event  and  sys_clk='1‘; 

-  latch  the  data  from  the  register  file  to  the  output 
dala__outJatch  <=  '1'; 

wait  until  sys_clk’event  and  sys_clk=’1'; 

-  hold  it  on  the  output 
data_outJatch  <;=  'O'; 

end  mowe^out; 

-  ADD  (reg1 ,  reg2,  reg3) 

procedure  add  (SIGNAL  reg1 ,  reg2,  reg3:  in  addr; 

SIGNAL  sys^clk:  in  std_ulogic; 

SIGNAL  adder^done:  In  std_ulogic; 

SIGNAL  a^reguaddr,  b_regLaddr,  c_reg_addr  out  addr; 
SIGNAL  adder_sel:  out  std_ulogic; 

SIGNAL  mux^sel:  out  std_ulogic_vector{1  downto  0); 
SIGNAL  adderjgo:  out  std_ulogic; 

SIGNAL  c_regLlatch:  out  std_ulogic)  is 

begin 

-  s^  up  two  terms  from  reg  file 
a^reSL^cldr  <=  reg2; 
b_reg_addr  <=  reg3; 

set  up  new  register  to  hold  result 
c_reg_addr  <=  reg1 ; 

-  set  adder/subtractor  to  add 
adder_sel  <='0'; 

-  set  mux  to  allow  add  result  to  go  to  register 
mux_sel  <="01"; 

wait  until  sys^clk'event  and  sys_clk=’1'; 


~  initiate  adder  unit 
adderjgo  <= '1'; 

wait  until  adder__done  =  '1*; 

wait  until  sys^clk’event  and  sys_clk='1’; 

-  release  adder  unit 
adderjgo  <=  'O'; 

-  latch  result  into  regiter 
c_reg_latch  <=  '1'; 

wait  until  sys_clk'event  and  sys_clk=’1'; 

-  hold  result  In  register 
c_reg_latch  <=  'O'; 
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end  add; 


-SUB (reg1,  reg2,  reg3) 

procedure  sub  (SIGNAL  reg1 ,  reg2,  reg3:  in  addr; 

SIGNAL  sys_clk:  in  std_ulogic; 

SIGNAL  adderjdone:  in  std_ulogic; 

SIGNAL  a_neg_addr,  b__reg_addr,  c_reg_addn  out  addr, 
SIGNAL  adder^sel:  out  std^ulogic; 

SIGNAL  mux_sel:  out  std_uIogic_vector(1  downto  0); 
SIGNAL  adder  jgo:  out  std_ulogic; 

SIGNAL  c_reg_latch:  out  std_,ulogic)  is 


begin 

-  set  up  two  terms  from  reg  file 
a_reg_addr  <=  reg2; 
b_regLaddr  <==  regS; 


-  set  up  new  register  to  hold  result 
c_regLaddr  <=  regl; 


-  set  adder/subtractor  to  sub 
adder_sel 


-  set  mux  to  allow  add  result  to  go  to  register 
mux_sel  <="01"; 

wait  until  sys_clk’event  and  sys_clk='f ; 


-  initiate  adder  unit 
adderjgo  <=  '1'; 


wait  until  adderjdone  =  '1'; 

wait  until  sySjClk’event  and  sys^clk^’V; 

-  release  adder  unit 
adderjgo  <=  'O’; 

-  latch  result  into  regiter 

Cjregjatch  <=  '1'; 

wait  until  sySjClk'event  and  sySjClk=‘1'; 

-  hold  result  in  register 
c ^regLlatch  <=  'O’; 

end  sub; 


-  MULTIPLY  (regl ,  reg2,  reg3) 
procedure  mult  (SIGNAL  regl ,  reg2,  reg3:  in  addr; 

SIGNAL  sys jClk:  in  std ^ulogic; 

SIGNAL  muKjdone:  in  std  jUlogic; 

SIGNAL  a^reg^addr,  b^reguaddr,  c^reg ^addr:  out  addr; 
SIGNAL  muXjSel:  out  stdjUlogiCjVector(1  downto  0); 
SIGNAL  multjgo:  out  std^ulogic; 

SIGNAL  Cjieg^latch:  out  std^ulogic)  is 

begin 

-  set  up  two  terms  from  reg  file 
a^reg jSddr  <=  reg2; 
b_regjaddr  <=  reg3; 

-  set  up  new  register  to  hold  result 
c^reg^addr  <=reg1: 

-  set  mux  to  allow  mult  resuti  to  go  to  register 

muXjSel  <="10"; 
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wait  until  sys^clk'event  and  sys_clk='1'; 

-  initiate  multiplier  unit 
muitjgo  <='1’; 

wait  until  mult_done  -  '1'; 

wait  until  sys^clk'event  and  sys_clk=’1'; 

-  release  mult  unit 
muItjgo  <=  'O'; 

-  latch  results  into  register 
c_regjatch  <=  '1‘; 


wait  until  sys_clk'event  and  sys_clk='1': 

-  hold  results  in  register 
c_reg_latch  <=  'O’; 

end  mult; 

-COS (reg1,  reg2) 

procedure  cos  (SIGNAL  reg1 ,  reg2:in  addr; 

SIGNAL  sys_clk:  in  std_ulogic; 

SIGNAL  cos_sin_ready:  in  std_ulogic; 

SIGNAL  cosjsin^sel:  out  std_uIogic; 

SIGNAL  a_reg„addr,  c_rBg_acldr:  out  addr; 

SIGNAL  mux_sel:  out  std_ulogic_vector(1  downto  0); 
SIGNAL  cos^sinjgo:  out  std_ulogic; 

SIGNAL  c_reg_lalch;  out  std^ulogic)  is 

begin 

-  set  unit  to  do  cosine 
cos_sin_sel  <=  'O'; 

-  set  input  to  A  register 
a_reg_addr  <=  reg2; 

-  set  up  mux  to  allow  cos/sin  unit  to  go  to  registers 
mux_sel  <=  "00"; 

-  set  up  new  register  to  put  result 
c_reg_addr  <=  reg1; 

wait  until  sys_clk'event  and  sysjclk='1'; 

-  initiate  unit 
cos_sinjgo  <='1'; 

wait  until  cos_sin_neady=’1'; 

wait  until  sys_clk'event  and  sys^clk='1'; 

-  release  unit 
cos^sinjgo  <=  'O'; 

-  latch  result  Into  register 
c^regjatch  <=  '1'; 

wait  until  sys_clk'event  and  sys_clk=’1'; 

-  hold  result  In  register 
c_reg_latch  <=  'O'; 

end  cos; 


-SIN  (reg1,reg2) 

procedure  sin  (SIGNAL  reg1 ,  reg2:in  addr; 
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SIGNAL  sys_clk:  in  std_ulogic; 

SIGNAL  cos_sin_ready:  in  std_ulogic; 

SIGNAL  cos_sln_sel:  out  std_ulogic; 

SIGNAL  a_reg_addr,  c„reg„addr:  out  addr; 

SIGNAL  mux^sel:  out  std_ulogic_vector(1  downto  0); 
SIGNAL  cos_sinjgo:  out  std_ulogic; 

SIGNAL  c_r^Jatch:  out  std_u!ogic)  is 

begin 

-set  unit  to  do  sine 
cos_stnjsei  <-  '1‘; 

-  set  input  to  A  register 
a_regu.acidr  <=  reg2; 

-  set  up  mux  to  allow  cos/sin  unit  to  go  to  registers 
mux_sel  <=  "00"; 

-  set  up  new  register  to  put  result 
c_reg_addr  <«  regi; 

wait  until  sys^dk’event  and  sysjclk=’1'; 

-  initiate  unit 
cos_sinjgo  <=  'V; 

wait  until  cos_sin_ready='1’; 

wait  until  sys_clk’event  and  sys_clk='1'; 

-  release  unit 
cos^sinjgo  <=  'O’; 

-  latch  result  into  register 
c^regjatch  <=  '1'; 

wait  until  sys_clk'event  and  sys_clk='1'; 

-  hold  result  in  register 
c_reg Jatch  <=  'O'; 

end  sin; 

end  MICROSTORE; 


B.5.2  Microstrore  Testbench 


-  Project: 

-  Filename: 

-  Other  files  required: 
-Date: 

-  Entity/Architecture  Name: 
-Developer 


Thesis 

microstore-bench.vhd 
microstore.vhd 
Oct  31  97 
microstorejb/test 
Steve  Pannley 


library  IEEE; 

use  IEEE.stdJoglcJ164.all; 

use  WORK.reg_filej3kg.all; 
use  WORK.microstore.all; 

entity  microstorejb  is 
end  microstorejb; 


architecture  test  of  microstorejb  is 
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component  fkp_core_e 
port  (fkp_coreL.clk 

fkp_coreLreset 

fkp_core_clataJn 

fkp«core_data_out 

fkp_core_dataJnJatch 

fkp_core_dala_outJatch 

fli_core_c_fegJatch 

fkp_corB_c_reg_addr 

fkp_core_a_reg_addr 

fkp_core_b_reg_addr 

fkp_core_cosjsin_ready 

fkp_core_cos_sinjgo 

fkp_core_cos_sin_sel 

fkp_core_cos_sln_wait 

fkp_core_rom_addr 

fkp_core_rom_data 

fkp-Core-.acider jgo 

fkp_core_adder_sel 

fkp_core_adder_done 

fkp-core-.™^ J90 

fkp_core_mult_done 
fkp_core_mux_sel 
end  component; 


in 

std_ulogic; 

in 

stdjUlogIc; 

in 

stdjUlogiCjVector(15  downto  0); 

out 

stdjUloglCjVector(15  downto  0); 

in 

stdjUlogic; 

in 

stdjUlogic; 

in 

stdjUlogic; 

in 

addr; 

in 

addr; 

in 

addr; 

out 

stdjUlogic; 

in 

stdjUlogic; 

in 

stdjUlogic; 

in 

std jUlogic jVector(2  downto  0); 

out 

std_ulogic_vector(12  downto  0); 

in 

stdjUlogicjvector(15  downto  0); 

in 

stdjUlogic; 

in 

stdjUlogIc; 

out 

stdjUlogic; 

in 

stdjUlogic; 

out 

stdjUlogic; 

in 

std jUlogic__vector(1  downto  0)); 

signal  sys_reset,  sys^clk :  std_ulogic  :=  'O'; 

signal  a_regLaddr,  b_reg_addr,  c_regLaddr :  addr; 

signal  dalajn,  data_out :  std_ulogic_vector{15  downto  0); 

signal  datajn_latch,  data_outJatch,  c^regLlatch,  cos_sin_ready :  std_ulogic; 

signal  cos_sinjgo,  cosjsinjsel,  adderjgo,  adder^sel,  adderjdone  :  std^ulogic; 

signal  multjgo,  mulLdone  •  std_u!ogic; 

signal  cos_sin_wait :  std_ulogic_vector(2  downto  0); 

signal  rom__addr :  std_ulogic_v0ctor(12 downto 0); 

signal  rom_data :  std_ulogic„vector(15  downto  0); 

signal  mux_sel  :  std_ulogic_vector(1  downto  0); 


type  opcode  is  (illegal,  movein,  moveout,  move,  cosine,  sine,  addition,  subtraction,  multiplication); 
signal  instruction :  opcode; 
signal  reg1,  reg2,  reg3 :  addr, 


begin 

U1  :  fkp_core_e 


PORT  MAP  (sys^clk, 

sys_reset, 

datajn, 

data_out, 


datajnjatch, 


data_outJatch, 

c_regjatch, 

c_reg_addr, 

a-.reg_addr, 

b_reg_addr, 

cos_sin_ready, 

cos^slnjgo, 

cos_sin_sel, 

cos_sinjwait, 

rom_addr, 

rom_data, 

adderjgo, 

adder_sel, 

adderjdone, 

multjgo, 

muitjdone, 

muXjSel); 
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clock :  process 
begin 

sys^clk  <=  not(sys_clk); 
wait  for  10  ps; 
end  process  clock; 

rst :  process 
begin 

sys_rese*  <=  '1'; 
wait  for  40  ps; 

sys_resel  <=  'O'; 
wait  for  50000  ps; 
end  process  rst; 

exercise :  process 
begin 


instruction  <=  illegal; 
datajnjatch  <=  'O'; 
data-out Jatch  <=  'O'; 
cjreg_lalch  <=  'O'; 
cos_sinjgo  <=  *0’; 
cosjsin^wait  <=  "1 1 1"; 
adderjgo  <=  *0'; 
muitjgo  <=  'O'; 
a_reg_addr  <=  15; 
b_reg_addr  <=  15; 
c_reg_addr  <=  15; 
n^_sel  <=  "00"; 
wait  for  60  ps; 

wait  until  sys_clk’event  and  sys_clk='1'; 

-MOVE  IN 

instruction  <=  movein; 
datajn  <=  "0000000000000101"; 
regl  <=  2; 

wait  until  sys_clk*event  and  sys^clk- 1'; 

moveJn(reg1,sys_clk,mux_sel,c_reg„addr, datajnjatch, c„regLlatch); 
-  END  MOVE  IN 


-  MOVE  OUT 

instruction  <=  moveout; 
regl  <=2; 

wait  until  sys_clk'event  and  sys_clk='1'; 

move  out(reg1  ,sys j:lk,b_reg_addr,datajxJtJatch); 

-END  MOVEOUT 

-MOVEIN 

instruction  <=  movein; 
datajn  <=  "000000000100101 1"; 
regl  <=3; 

wait  until  sys_clk'event  and  sys_clk='T; 

moveJn(reg1,sys_dk,mux_se!,c„regLaddr, datajnjatch, c_reg_latch); 

-  END  MOVE  IN 

-  MOVE  OUT 

instruction  <=  moveout; 
regl  <=  3; 

wait  until  sysjilk'event  and  sysjjik- T; 
move_out(reg1  ,sys_clk,b_regLacldr,data_out Jatch) ; 

-END  MOVE  OUT 
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-ADD 

Instruction  <=  addition; 
reg1  <=  10; 
reg2  <=  2; 
reg3  <=  3; 

wait  until  sys_clk'event  and  sys_clk='1'; 

add(reg1,reg2,reg3,sys_dk,adder_done,a_reg_addr,b_regLaddr,c_resLaddr, 

adder_sel,mux_sel, adder jgo,c„regJatch); 

-END  ADD 


-  MOVE  OUT 

instruction  <=  moveout; 
reg1  <=  10; 

wait  until  sys_clk‘event  and  sys_clk=’1'; 
move_out(reg1,sys_clk,b_reg_addr,data_outJatch); 

-  END  MOVE  OUT 

-MOVE 

instruction  <=  move; 
regl  <=11; 
reg2  <=  0; 
reg3  <=  10; 

wait  until  sys_clk'event  and  sys_clk=’T; 

ddd(reg1  ,reg2,  reg3,sys__clk,ilderjdone,a_regLaddr,b_reg_addr,c_reg_addr , 
adder_sel,  mux_sel,adderjgo,c_reg Jatch); 

-  END  MOVE 


for  i  in  0  to  3  loop 
-SUB 

instruction  <=  subtraction; 
regl  <=  11; 
reg2<=  11; 
reg3  <=1; 

wait  until  sys_clk’event  and  sys_clk- 1'; 

sub(reg1,reg2,reg3,sys_clk,adderjdone,a_reg_addr,b_reg_addr,c_regjaddr, 

adder_sd,mux_sel,adderjgo,c_regJatch); 

-END  ADD 


-MOVEOUT 

Instruction  <=  moveout; 
regl  <=11; 

wait  until  sys__clk’event  and  sys_ctk=’T; 
move_out(reg1  ,sys_clk,  b_reg_addr,data__out Jatch); 
-  END  MOVE  OUT 


end  loop; 

-  Multiply 

instruction  <=  multiplication; 
regl  <=  31; 
reg2  <=  2; 
reg3  <=  3; 

wait  until  sys_clk'event  and  sys__clk=’r; 

mult(reg1.reg2,reg3,sys_clk,mult_done,a_regLaddr,b_reg_addr,c_reg_addr, 

mux_sel,  mult jgo,  c_regLlatch); 


-END  ADD 
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foriin0to31  loop 

-  MOVE  OUT 

instaiction  <=  moveout; 
regi  <=  i; 

wait  until  sys_clk'event  and  sys^clk-l’; 

move_out(reg1,sys_clk,b_regLaddr,data_outJatch); 

-  END  MOVE  OUT 
end  loop; 


--COSINE 

instruction  <=  cosine; 
reg2  <=  2; 
reg1  <=  15; 

wait  until  sys_clk’event  and  sys_clk=’1'; 

cos(reg1,reg2,  sys_cIk,cos_sin_ready,cos_sin_sel,a_regLaddr, 
c_reg_addr,  mux_sel,cos_sinjgo,c_regLlatch); 


-  MOVE  OUT 

instruction  <-  nnoveout; 
regi  <=15; 

wait  until  sys_clk'event  and  sys_clk- 1'; 
move_out(reg1,sys_clk,b_regLaddr,data_outJatch); 

-  END  MOVE  OUT 

-SINE 

instruction  <=  sine; 
reg2<=3; 
reg1  <=  16; 

wait  until  sys_clk'event  and  sysjclk=’T; 
sin(reg1  ,reg2,  sys_clk,cos_sin_ready,cos_sin_sel,a_reg_addr, 
c_reg_addr,  mux_sel,cos_sinjgo,c_regJatch); 


-MOVEOUT 

instruction  <=  moveout; 
regi  <=  16; 

wait  until  sys_clk*event  and  sys_c!k='T; 
move_out(reg1  ,sys_clk,b_reg_addr,data_out_latch); 

-  END  MOVE  OUT 

wait  until  sys_clk’evBnt  and  sys_c!k='1'; 
wait  until  sys_clk'event  and  sys_clk='1'; 
wait  until  sys_clk'event  and  sys_clk='1’; 

ASSERT  false 

REPORT ’’DONE" 

SEVERITY  failure; 

end  process  exercise; 

rom :  process 
begin 

wait  until  rom^addr'event; 

-  make  up  some  rom  data  (inverse  of  the  address  for  now) 
rom_data(12  downto  0)  <=  not(rom_addr(12  downto  0)); 

-  fill  in  the  rest 

rom_data(15  downto  13)  <=  "000"; 
end  process  rom; 


end  test; 
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CONFIGURATION  mlcrostore^c  OF  microstoreLtb  IS 
FOR  test 

FOR  ALL:  fkp_coreLe 

USE  ENTITY  WORK.fkp_cc)re_e(structural); 
END  FOR; 

END  FOR; 

END  microstore_c; 


B,53  Microstrore  Results 
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B.9  Control 


B.5.9  Control  Model 


-  Project: 

-  Filename: 

-  Other  files  required: 
-Date: 

-  Entity/Architecture  Name: 
-Developer: 

-  Function: 

-  Limitations; 

-  History: 

-  Last  Analyzed  On: 


Thesis 
fkp.vhd 
all  FKP  files 
Oct  17  97 
fkp_e/behavior 
Steve  Parmley 


library  IEEE; 

use  IEEE.stdJogic_1 164.all; 

use  WORK.reg_file_j>kg,all; 
use  WORK.microstore.all; 

entity  fkpje  Is 
port  (fl<p_cntprt7_c!ock 

fkp_cntprt6_reset 

fkpjcntprt5_strobe 

fkp_cntprt4_feady 

fkp_cntprt3_dgv 

fkp_cntprt2_dga 

-  fkpjcntprt1_dsv 

-  fkp_cntprtO_dsa 

fkp_cmdprt6_cmd1 

fkpj:fTKlprt5_crndO 

fkp_cmdprt4_a4 

fkp_cmdprt3_a3 

fli_cmdprt2_a2 

fkp_cmdprt1_a1 

fkp_cmdprtO_aO 

fkp^datajn 

fkp_data_out 

fkp_rom_addr 

fkp^romjata 

end  fkpje; 

architecture  structural  of  fkp_e  is 
-SIGNALS 

signal  sys_reset,  sys_clk :  std_ulogic  :=  ‘O’; 
signal  a^regLaddr,  b_reg_addr,  c^regjaddr :  addr; 


in  std_ulogic: 

In  std_ulogic; 

in  std_ulogic; 

out  std^ulogic; 

out  std_ulogic; 

in  std^ulogic; 

in  std_ulogic; 

out  std_ulogic; 

in  std_ulogic; 

in  std_ulogic; 

in  std_ulogic; 

in  std_uloglc; 

in  std_uloglc; 

in  std_uloglc; 

in  std_ulogic; 

in  std„ulogic_vector(1 5  downto  0); 

out  std_ulogic_vector(1 5  downto  0) ; 

out  std_ulogic_vector(  1 2  downto  0); 

in  std_ufogic_vector(15  downto  0)); 


signal  datajn,  data_out :  std_ulogic_vector(15  downto  0); 

signal  datajnjatch,  data-out Jatch,  c^regjlatch,  cos_sin_ready :  std^ulogic; 

signal  cos_sinjgo,  cosjsinjsel,  adderjgo,  adder_sel,  adder^done  :  std^ulogic; 

signal  muitjgo,  mult_done  :  std_ulogic; 

signal  cos_sin_wait :  std__ulogic_vector(2  downto  0); 

signal  rom_addr ;  std_ulogic_vector(12  downto  0); 

signal  rom_data :  std_ulogic_vector(15  downto  0); 

signal  mux_sel  :  std_ulogic_vector(1  downto  0); 

type  opcode  is  (Illegal,  movein,  moveout,  move,  cosine,  sine,  addition,  subtraction,  multiplication); 
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signal  instruction :  opcode; 
signal  reg1,  reg2,  reg3 :  addr; 

signal  state :  integer; 


-COMPONENTS 
component  fkp_core_e 
port  (fkp_core_clk 

fkp_core_reset 
fkp„core_dalaJn 
fkp_core_data_out 
^kp„core_dataJnJatch 
fkp_core_data_outjatch 
fkp_core_c_regJatch 
fl<P«.core_c„regLaddr 
^kp-Core_a_reg__addr 
fkp_core_b_reg_addr 
fkp_core_cos_sin_ready 
fkp_core_cos_sin jgo 
fkp_core_cos_sin_sel 
fk|D_core„cos_sinjvait 
fkp_core_rom_addr 
fkp_core_rom_data 
fl<P-.core_adderjgo 
fkp-COfe-aclder^sel 
fkp_core_adder_done 
fkp_core_multjgo 
fkp_core_mult_done 
fkp_core_mux_sel 
end  component; 


in 

std_ulogic; 

in 

std_ulogic; 

in 

std_uloglc_vector(15  downto  0); 

out 

std_ulogic_vector(15  downto  0); 

in 

std_ulogic; 

in 

std_ulogic; 

in 

std_ulogic; 

in 

addr; 

in 

addn 

in 

addr; 

out 

std_ulogic; 

in 

std_uiogic; 

in 

std_ulogic; 

in 

std_ulogic_vector(2  downto  0); 

out 

std_ulogicjvector(12  downto  0); 

in 

std_ulogic__vector(15  downto  0); 

in 

std_ulogic; 

in 

std_ulogic; 

out 

std^ulogic; 

in 

std_ulogic; 

out 

std_ulogic; 

in 

std_ulogic_vector(1  downto  0)); 

begin 

U1  :  fkp_core_e 
PORT  MAP 


(sys_clk, 

sys_reset, 

datajn, 

datajxit, 

datajnjatch, 

data^outjatch, 

c_reg_latch, 

c_reg_addr, 

a-i^-^oddr, 

bLreg_addr, 

cos_sin_ready, 

cos_sinjgo, 

cos_sin_$el, 

cos_sinjwait, 

rom_addr, 

rom_data, 

adderjgo, 


adder_sel, 


adder^done, 

multjgo, 

mult_done, 

mux_sel); 


controller :  process 
varistole  r1  :  Integer; 
begin 

sys_clk  <=  fkp_cntprt7jclock; 
-  system  wide  reset  ? 
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if  fkp_cntprt6_reset  =  '1'  then 
sys^reset  <=  ‘1*; 

waiTuntil  sys_clk'event  and  sys_c(k='1'; 
sys_reset  <= 'T; 
state  <=  0; 

fkp_cntprt4_ready  <=  ’1'; 
fkp_cntprt3_dgv  <=  'O'; 

end  if; 

-  ready  to  accept  command 
if  stale  =  0  then 

-  either  set,  get,  or  run 
if  fkp_cntprt5_strobe  =  '1'  then 
-set 

if  fkp_cmdprt6_cmd1=’0’  and  fkp_cmdprt5„cmd0='0'  then 
not  ready  flag 
fkp_cntprt4_ready  <=  'O'; 

-  set  the  register  designated  by  the  a4-a0  bits  to 
» the  data  from  the  input  data  bus 

-  MOVE  IN 
instruction  <=  movein; 
dalajn  <=  fkp_datajn; 

-  transform  bits  to  integer 
r1  :=  0; 

if  fkp_cmdprt4_a4  =  '1'  then 
r1  :=  r1  +  16; 

end  if; 

If  fkp_cmdprt3_a3  =  '1'  then 
r1  :=  r1  +  8; 

end  if; 

if  fkp_cmdprt2_a2  =  '1'  then 
r1  :=  r1  +  4; 

end  if; 

If  fkp_cmdprt1_a1  =  '1'  then 
r1  :=  r1  +  2; 

end  if; 

if  fkp_cmdprtO_aO  =  '1'  then 
r1  :=r1  + 1; 

end  if; 

-  s^  target  register 
regl  <=  r1; 

wait  until  sys_clk’event  and  sys_clk- 1'; 

moveJn(reg1,sys_clk,mux_sel,c_reg_addr,dataJnJatch,c_reg_latch); 

-  END  MOVE  IN 

wait  until  fkp_cntprt5_strobe  » 'O’; 

-  set  ready  flag 
fkp_cntprt4_ready  <=  '1'; 


-get 

eisif  fkp_cnndprt6_cmd1- O'  and  fkp_cmdprt5_cmd0=’1'  then 

-  set  not  ready  flag 
fkp_cntprt4_ready  <=  'O'; 

-  gel  the  register  designated  by  the  a4*a0  bits  to 

-  the  data  from  the  input  data  bus 


~  MOVE  OUT 
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instruction  <=  moveout; 

-  transform  bits  to  integer 

r1  :=  0; 

if  fkp_cmdprt4_a4  =  '1'  then 
r1  r1  +  16; 

end  if; 

if  fkp_cmdprt3_a3  =  ’1‘  then 
r1  :=  r1  +  8; 

end  if; 

iffkp  cmdprt2  a2  =  'Vthen 
r1  :=  r1  +  4; 

end  if; 

if  fkp_cmdprt1_a1  =  '1’  then 
r1  :=  r1  +  2; 

end  if; 

if  fkp_cmdprt0_a0  =  ‘1'  then 
r1  r1  + 1 ; 

end  if; 

-set target  register 

regl  <=  r1 ; 


wait  until  sysj:lk*event  and  sys_clk~  1'; 
move_out(reg  1  ,sys_clk,b_reg_addr,dala_out Jatch); 

-  END  MOVE  OUT 

fkp_data_out  <=  data_out; 

-  let  user  know  data  is  valid 
fkp_cntprt3_dgv  <=  '1'; 

wait  until  fkp_cntprt2jdga  ='T; 

-  user  has  data 

-  release  dgv 
fkp_cntprt3_dgv  <=  ‘O’; 

wait  until  fkp_cntprt5_strobe  =  'O’; 


-set  ready  flag 
fkp_cntprt4__ready  <=  '1'; 


-  run 

elsif  fkp_cmdprt6_cmd1='r  and  fkp_cmdprt5_crTKl0='0'  then 

-  set  nrt  ready  flag 
fkp_cntprt4_ready  <=  '0*; 

—  ASSUME  that  the  5  constansts  are  located  in  r2,r3,r4,r5,r6 

-  ASSUME  that  the  4  angles  are  located  in  r7.r8,r9,r10 

-  this  was  accomplished  using  the  set  function 

-  See  table  4.4b  of  Thesis  for  order  of  operations 

_  ****  STEP  1  **** 

-desc:  reg  26  =  sin  of  theta  1 
instruction  <=  sine; 
regl  <=26; 
reg2  <=  7; 

wait  until  sys_clk'event  and  sys_clk='1'; 
sin(reg1  ,reg2,  sys_clk,cos_sin„ready,cos_sin_sel,a_reg_addr, 
c_regLaddr,  mux_sel,cos_sinjgo,c_regLlatch); 

_**»*  STEP  2  **** 
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-  desc:  reg  1 1  =  cos  of  theta  1 
Instruction  <=  cosine; 

reg1  <=  11; 
reg2  <=  7; 

wait  until  sys^clk'event  and  sys_clk='1‘; 
cos(reg1  ,feg2,  sysjclk,cos_sin_ready,cos_sin_sei,a_reg_addr, 
c_reg_addr,  mux_sel,cos_sinjgo,c_regLlatch); 

-****  STEPS  **** 

-desc:  reg  12  =  sin  of  theta  2 
instruction  <=  sine; 
regl  <=  12; 
reg2  <=  8; 

wait  until  sys_clk'event  and  sys_clk=’T; 
sin(reg1  ,reg2,  sys_clk,cos_sin_ready,cC8_sin_sel,a_regLadclr, 
c_reg_addr,  mux_sel,cos_slnjgo,c_reg_latch); 

-****  STEP  4  **** 

-desc:  reg  13  =  cos  of  theta  2 
instruction  <=  cosine; 
reg1  <=  13; 
reg2  <=  8; 

wait  until  sys__clk'event  and  sys_dk='T; 
cos(reg1  sys_cik,cos_sln_ready,cos_sin_sd,a_regLacldr, 
c_reg_addr,  mux_sel,cos_sinjgo,c_regJatch); 

-  ****  STEP  5  **** 

-desc:  reg  14  =  theta 2  +  theta 3 

instruction  <=  addition; 

regl  <=  14; 

reg2  <=  8; 

reg3  <=  9; 

wait  until  sys_clk'event  and  sys_clk=’T; 

add(reg1,r©g2,reg3,sys_clk,adder_done,a_reg_addr,b_reg„addr,c_regLaddr, 

adder_sd,mux_sel,adderjgo,c_reg_latch); 

_****  STEPS  **** 

-desc:  reg  15  =  sin  of  theta  2+3 
instruction  <=  sine; 
regl  <=  15; 
reg2  <=  14; 

wait  until  sys^clk’event  and  sys_clk='T; 
sin(reg1  ,reg2,  sys_clk,cos_sin_ready,cos_sin_sel,a_reg_addr, 
c__reg_addr,  mux_sel,cos_sin_go,c_regLlatch); 

-****  STEP?  **** 

-  desc:  reg  1 6  =  cos  of  theta  2+3 
instruction  <=  cosine; 

regl  <=  16; 
reg2  <=  14; 

wait  until  sys_clk'event  and  sys_clk=‘1'; 
cos{reg1  ,reg2,  sys_clk,cos_sln_ready,cosjsin_sel,a_regu.addr, 
c_reg_addr,  mux_seI,cos_sin_jgo,c_regJatch); 

STEP  8  **** 

-  desc:  reg  14  =  theta  2  +  theta  3  +  theta  4 
instruction  <=  addition; 

regl  <=  14; 
reg2  <=  14; 
reg3  <=  10; 

wait  until  sys^clk'event  and  sys_clk='T; 

add(reg1,reg2,reg3,sys_clk,^der_done,a_reg_addr,b_reg_addr,c__regLaddr, 
adder_sel,mux_sel, adder jgo,c_regLlatch); 
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-desc:  reg  22  =  sin  of  theta  2+3+4 
instruction  <=  sine; 
reg1  22; 
reg2  <=  14; 

wart  until  sys_c(k'event  and  sys_clk=’1'; 
sin(reg1  ,Teg2^  sys_clk,cos_sin_ready.cos_sin_sel,a_reg_addr, 
c_reg_addr,  mux_sel,cos_sinjgo,c_regLiatch); 

-.****  STEP  10  **** 

-desc:  reg  25  =  cos  of  theta  2+3+4 
instruction  <=  cosine; 
regl  <-  25; 
reg2  <=  14; 

wait  until  sys_clk'event  and  s>^_clk=s'1*; 
cos(reg1  ,reg2.  sys_cIk,cos_sin_ready,cos_8in_sel,a_regLaddr, 
c_reg_addr,  mux_sel,cos_slnjgo,c_reg_latch); 

_****  STEP  11  **** 

-  desc:  reg  20  =  cos  (thi)  *  cos  (th2+th3+th4) 
instruction  <=  multiplication; 

regl  <=  20; 
reg2<=11; 
reg3  <=  25; 

wait  until  sys_clk’event  and  sys_clk=’r; 

mult(reg1,reg2,reg3,sys_clk,muIt_done,a_reg_addr,b_regLaddr,c_regLaddr, 
mux_sel,  multjgo,  c_reg_latch); 

__  ****  gYEP  12  **** 

-  desc:  reg  21  -  sin  (thi)  *  cos  (th2+th3+th4) 
instruction  <=  multiplication; 

reg1  <=21; 
reg2<=26; 
reg3  <=  25; 

wait  until  sys^clk’event  and  sys_clk=’T; 

mult(reg1,re^.reg3,sys_clk,mult_done,a_reg_addr,b_reg_addr,c_reg_addr, 

mux_sel,  mult jgo,  c_regLlatch); 

^  ****  STEP  13  **** 

-  desc:  reg  23  =  cos  (thi)  *  sin  (th2+th3+th4) 

Instruction  <=  multiplication; 

regl  <=  23; 
reg2  <=11; 
reg3  <=  22; 

wait  until  sys^clk'event  and  sys_clk='1'; 

mult(reg1,reg2,reg3,sys_clk,mult_done.a_reg_addr,b_reg_addr,c_reg_addr. 

mux_sel,  muitjgo,  c^regjatch); 

-****  STEP  14  **** 

-  desc:  reg  23  =  -(cos(thl)  *  sin(th2+th3+th4) 
instruction  <=  subtraction; 

regl  <=  23; 
reg2<=0; 
reg3  <=  23; 

wait  until  sys„clk'event  and  sys_clk='1'; 

sub(reg1 ,  reg2,reg3,sys_clk, adder jdlone,a_reg_addr,b_reg_addr,c_regLacldr, 
adder_sel,mux_sel, adder jgo,c_regJatch); 

_****  STEP  15  **** 

-  desc:  reg  24  =  sin  (thi)  *  sin  (th2+th3+th4) 
instruction  <=  multiplication; 

regl  <=  24; 
reg2  <=  26; 
reg3  <=  22; 

wait  until  sys_clk'event  and  sysjclk=’T; 

mult(reg1,reg2,reg3,sys_clk,mult_done,a_reg_addr,b_neg_addr,c_reg_addr, 
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mux_se!,  muitjgo,  c^regLlatch); 

-****  STEP  16  **** 

-  desc:  reg  24  =  -(sin(th1)  *  sin(th2+th3+th4) 
instruction  <=  subtraction; 

regl  <=  24; 
reg2  <=  0; 
reg3  <=  24; 

wait  until  sys_clk’event  and  sys_clk=’T; 

sub(reg1,reg2,reg3,sys_clk,adder_done,a_regL.addr,b_regLaddr,c_reguaddr, 
adder__sd  ,mux_sel,adder_go,c_reg Jatch); 

-****  STEP  17  **** 

-desc:  reg27  =  ~(cos(th1)) 
instruction  <=  subtraction; 
reg1  <=  27; 
reg2  <=  0; 
reg3  <=  11; 

wait  until  sys_clk'event  and  sys_clk='r; 

sub(reg1,reg2,reg3,sys_clk,adder_done,a__reg_addr,b_reg_addr,c_regLaddr, 

adder_sel,mux__sel,adder_go,c_reg_latch); 

_****  3-|-£p  ♦*** 

-desc:  reg 28  =  0 
instruction  <=  addition; 
reg1  <=  28; 
reg2  <=  0; 
reg3  <=  0; 

wait  until  sys_clk'event  and  sys_clk=’T; 

sub(reg1,reg2,reg3,sys_clk,adder_done,a_reg_addr,b_reg_addr,c_regLaddr, 
adderjsel.muxjs^, adder jgo,c_regJatch); 

-****  STEP  19  **** 

-  desc:  reg  17  =  a2  *  cos  (th2) 
instruction  <=  multiplication; 
regl  <=  17; 

reg2  <=  4; 
reg3  <=  13; 

wait  until  sys_clk'event  and  sys_clk='T; 

mult(reg1,reg2,reg3,sys_clk,mult_done,a_regLaddr,b_reg_addr,c_reg_addr, 

mux^sel,  multjgo,  c_regjatch); 

STEP  20  **** 

-  desc:  reg  1 8  =  a3  *  cos  (th2+th3) 
instruction  <=  multiplication; 

reg1  <=  18; 
reg2  <=  6; 
reg3  <=  16; 

wait  until  sys_clk’event  and  sys_c!k='1’: 

mult(reg1,reg2,reg3,sys_clk.mult.done,a_reg_addr,b_rea_addr,c_reg_addr, 
mux_sel,  mult _go,  c_regLlatch); 

_****  STEP  21  **** 

-  desc:  reg  17  =  a2*cos(th2)  +  a3*cos(th2+th3) 

Instruction  <=  addition; 

regl  <=  17; 
reg2  <=  17; 
reg3  <=  18; 

wait  until  sys_clk’event  and  sys_clk=’1’; 

sub(reg1,reg2,reg3,sys_clk,adder_done,a_regLaddr,b_regL.addr,c_regLaddr, 

adder_sel,mux_sel,adderjgo,c_reg_latch); 

-***♦  STEP  22  **** 

-  desc:  reg  17  =  a1  +  a2*cos(th2)  +  a3*cos(th2+th3) 
instruction  <=  addition; 
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reg1  <=  17; 
reg2  <=17; 
reg3  <=  3; 

wart  until  sys^clk’event  and  sys_clk='r; 

sub(reg1,reg2,reg3,sys_clk,adder_done,a_regLaddr,b_regLaddr,c_reg_addr, 
adder jsel,  muxjsel,adderjgo,c_regL_latch); 

_****  STEP  23  **** 

-  desc:  reg  18  =  cos(th1)  *  (a1  +  a2*cos(th2)  +  a3  *  cos  {th2+th3)) 
instruction  <=  multiplication; 

regl  <=  18; 
reg2  <=  17; 
reg3<=  11; 

wait  until  sys^clk'event  and  sys_clk=’1'; 

mult(reg1,reg2,reg3,sys_clk,multjione,ajTegLaddr,b_reg_addr,c_reg^addr, 

mux_sel,  mult_go,  c_reg_latch); 

-♦***  STEP  24  **** 

-  desc:  reg  29  =  aO  +  cos(th1)*(a1  +  a2*cos(th2)  +  a3*cos(th2+th3) 
instruction  <=  addition; 

regl  <=  29; 
reg2  <=  18; 
reg3  <=  2; 

wait  until  sys_clk'event  and  sys_clk- 1'; 

sub(reg1,reg2,reg3,sys_clk,adder__done,a_regLaddr,b__regLaddr,c_regjaddr, 

adderjsel,mux_sel ,  adderjgo,c_reg Jatch); 

__  STEP  25  **** 

-  desc:  reg  30  =  sin(th1 )  *  (a1  +  a2*cos(th2)  +  a3  *  cos  (th2+th3)) 
instruction  <=  multiplication; 

regl  <=  30; 
reg2  <=  17; 
reg3  <=  26; 

wait  until  sys^clk'event  and  sys_clk- T; 

mult(reg1.reg2,reg3.sys_clk,mult_done,a_regLaddr,b_reg_addr,c_regLa1clr, 

mux_sel,  multjgo,  c_reg__latch); 

--****  STEP  26  **** 

-  desc:  reg  1 9  =  a2*sin(th2) 
instruction  <=  multiplication; 
regl  <=19; 

reg2  <=  4; 
reg3  <=  12; 

wait  until  sys_clk'event  and  sys_clk='1'; 

mult(reg1,reg2,reg3,sys_clk,mult_done,a_regLaddr,b_regLaddr,c_reg_addr, 
mux_sel.  multjgo,  c_regjatch); 

_****  STEP  27  **** 

-  desc:  reg  31  =  a3  *  sin  (th2+th3)) 
instruction  <=  multiplication; 

regl  <=  31; 
reg2  <=  5; 
regS  <=  15; 

wait  until  sys^clk'event  and  sys_clk=’T; 

mult(reg1,reg2,reg3,sys_clk,mult_done,a_reg_addr,b_rBgLaddr,c_reg_addr, 

mux_sel,  multjgo,  c_regjatch); 

STEP  28  **** 

-  desc:  reg  31  =  a2  *  sln(th2)  +  a3  *  sin  (th2+th3)) 
instruction  <=  addition: 

regl  <=31; 
reg2  <=  31; 
reg3  <=  19; 

wait  until  sys^clk'ewent  and  sys.cIk-T; 

mult(reg1,reg2,reg3,sys_clk,mult_done,a_reg_addr,b_regLaddr,c_reg„addr, 
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mux^sel,  muitjgo,  c_regjatch); 

STEP  29  **** 

—  desc:  reg  31  =  a2  *  sin(th2)  +  a3  *  sin  (th2+th3))  +  d1 

instruction  <=  addition; 

regl  <=31; 

neg2<=31; 

reg3  <=  6; 

wait  until  sys_clk’event  and  sys_dk=’T; 

add(reg1,rBg2,reg3,sy8_clk,adder_done,a_reg_addr,b_regLaddr,c_reg_addr, 
adder_sd,mux_sel, adder jgo,c_reg_latch); 


wait  until  fkp_cntprt5_strobe  =  'O’; 

-  set  ready  flag 
fkp_cntprt4_ready  <=  '1'; 


end  if; 

end  if; 


end  if; 


end  process  controller; 
end  structural; 
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Appendix  C:  XACTstep  Synthesis  Log  File  for  Register  File 

ngdbuild  -p  xc4000e  C:\exemplat^wcxk\reg16\^eg16.xnf  xc4000e.ngd 
ngdbulld:  version  M13.7 

Copyright  (c)  1995-1997  Xilinx,  Inc.  All  rights  reserved. 

Command  Line:  ngdbuild  -p  xc4000e  C:\exemplaiAwork\rieg16\reg16.xnf  xo4000e.ngd 


Launcher:  Using  rule  XNF_RULE 

Launcher:  reg16.ngo  being  compiled  because  it  does  not  exist 
Launcher:  Running  xnf2ngd  from  C:\exemplar\work\reg16\xprpi\veri\ 

Launcher  Executing  xnf2ngd  -p  xc4000e  -u  "C:\e)«mplar\work\reg16\reg16-xnf' 
"regie.ngo" 

xnf2ngd:  version  M1 .3.7 

Copyright  (c)  1995-1997  Xilinx,  Inc.  All  rights  reserved, 
using  XNF  gate  model 

reading  XNF  file  "C:/e)eemplar/worWreg16/reg16.xnf' ... 

Writing  NGO  flle”reg16.ngo‘' ... 

Launcher  "xnf2ngd”  exited  vwth  an  exit  code  of  0. 

Reading  NGO  file"C:/e)«mplaryWork/reg16/xprcij/ver1/reg16.ngo'’ ... 

Reading  component  libraries  for  design  expansion... 

Running  Timing  Specification  DRC... 

Timing  Specification  DRC  complete  with  no  errors  or  warnings. 

Running  Logical  Design  DRC... 

Logical  Design  DRC  complete  with  no  errors  or  warnings. 

NGDBUILD  Design  Results  Summary: 

2148  total  blocks  expanded. 

Writing  NGD  file  "xo4000e.ngd'* ... 

Writing  NGDBUILD  log  file  "xc4000e.bld"... 

NGDBUILD  Done. 

map  -p  xo4020e-3-hq208  -o  map.ncd  ../)a>4CX)0e.ngd  reg16.pcf 
map:  version  Ml  .3.7 

Copyright  (c)  1995-1997  Xilinx,  Inc.  All  rights  reserved. 

Reading  NGD  file  ’'../xc4000e.ngd''... 

Using  target  part  "4020ehq208-3". 

MAP  xc4000e  directives: 

Partnam©=''xc40206-3-hq208". 

No  Guide  File  specified. 

No  Guide  Mode  specified. 

Covermode="area". 

Coverlutsize=4. 

Coverfgs[ze=4. 

Perform  logic  replication. 

Pack  CLBs  to  97%. 

Processing  logical  timing  constraints... 

Running  general  design  DRC... 

Verifying  F/HMAP  validity  based  on  pre-trimmed  logic... 

Removing  unused  logic... 

Processing  global  clock  buffers... 

WARNING:baste:24  -  All  of  the  external  outputs  in  this  design  are  using 
slew-rate-limited  output  drivers.  The  delay  on  speed  critical  outputs  can  be 
dramatically  reduced  by  designating  them  as  fast  outputs  in  the  original 
design.  Please  see  your  vendor  interface  documentation  for  specific 
information  on  how  to  do  this  within  your  design-entry  tool. 

Optimizing... 

Removed  Logic  Summary: 
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Design  Summary: 

Number  of  warnings; 

Number  of  errors: 

Number  of  CLBs: 
Flops/latches: 

4  Input  LUTs: 

3  input  LUTs: 

Number  of  bonded  lOBs: 
Number  of  clock  lOBs; 

10  flops/latches: 

Number  of  primary  CLKs: 
Writing  design  file  ’’map.ncd"... 


1 

0 

315  out  of  784 
224 
621 
183 

63outof  160 
1  out  of  8 
32 

1  out  of  4 


par  -w  -1 4  -d  0  map.ncd  reg16.ncd  reg16.pcf 
PAR:  Xilinx  Place  And  Route  Ml  .3.7. 

Copyright  (c)  1995-1997  Xilinx,  Inc.  All  rights  reserved. 


Constraints  file:  reg16.pcf 
Placement  level-cost:  4-1 


Loading  device  database  for  application  par  from  file  "map.ncd". 

"reg16"  is  an  NCD,  version  2.27,  device  xo4020e.  package  hq208,  speed  -3 
Loading  device  for  application  par  from  file  '4Q20e.nph’  in  environment 
d:Adllnx. 

Device  speed  data  version:  x1_0.79  PRELIMINARY. 


Device  utilization  summary: 


lO 

63/224 

28%  used 

63/160 

39%  bonded 

LOGIC 

315/784 

40%  used 

SPECIAL 

1/3023 

0%  used 

CLKIOB 

1/8 

12%  used 

lOB 

62/224 

27%  used 

CLB 

315/784 

40%  used 

PRI-CLK 

1/4 

25%  used 

Starting  initial  Placement  phase.  REAL  time:  13  secs 
Finished  initial  Placement  phase.  REAL  time:  14  secs 

Starting  Constructive  Placer.  REAL  time:  15  secs  . 

Placer  score  =  1081980 

Placer  score  =  977380 

Placer  score  =  886140 

Placer  score  =  853480 

Placer  score  =  783540 

Placer  score  =  705220 

Placer  score  =  634260 

Placer  score  =  577740 

Placer  score  =  486240 

Placer  score  =  439200 

Placer  score  =  375240 

Placer  score  =  332160 

Placer  score  =  298500 

Placer  score  =  284400 

Placer  score  =  271260 

Placer  score  =  260940 
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Placer  score  =  265660 
Placer  score  =  252840 
Placer  score  =  248700 
Placer  score  =  246900 
Placer  score  =  245640 
Placer  score  =  244680 
Placer  score  =  244320 
Placer  score  =  242160 
Placer  score  =  241920 
Placer  score  =  241 140 
Placer  score  =  240240 
Placer  score  =  239220 
Placer  score  =  238920 
Placer  score  =  238560 
Placer  score  =  237900 

Finished  Constructive  Placer.  REAL  time:  11  mins  30  secs 


Dumping  design  to  file  "reg16.ncd". 

Starting  Optimizing  Placer.  REAL  time:  11  mins  31  secs 

Optimizing  . 

Swapped  30  comps. 

Xilinx  Placer  [1]  235080  REAL  time:  12  mins  40  secs 

Optimizing  . 

Swapped  5  comps. 

Xilinx  Placer  [2]  234840  REAL  time:  13  mins  45  secs 
Finished  Optimizing  Plac®’.  REAL  time:  13  mins  45  secs 


Dumping  design  to  file  ’'reg16.ncd". 

Total  REAL  time  to  Placer  completion:  13  mins  47  secs 
Total  CPU  time  to  Placer  compMion:  13  mins  47  secs 

0  connection(s)  routed;  2231  unrouted. 

Starting  router  resource  preassignment 

Completed  router  resource  preassigniDent.  Real  time:  13  mins  49  secs 
Starting  iterative  routing. 

End  of  iteration  1 

2231  successful;  0  unrouted;  (0)  real  time:  14  mins 
Constraints  are  met. 

Power  and  ground  nets  completely  routed. 

Dumping  design  to  file  "reg16.ncd". 

Starting  cleanup 
End  of  cleanup  iteration  1 

2231  successful;  0  unrouted;  (0)  real  time:  15  mins  17  secs 
Dumping  design  to  file  "reg16.ncd". 

Total  CPU  time  15  mins  18  secs 
Total  REAL  time:  15  mins  18  secs 
Completely  routed. 

End  of  route.  2231  routed  (100.00%);  0  unrouted. 

No  errors  found. 

Total  REAL  time  to  Router  compl^on:  15  mins  20  secs 
Total  CPU  time  to  Router  completion:  15  mins  20  secs 

Generating  PAR  statistics. 

Timing  Score:  0 

Dumping  design  to  file  "reg16.ncd”. 


All  signals  are  completely  routed. 

Total  REAL  time  to  PAR  compl^ion:  15  mins  28  secs 
rdta\  CPU  time  to  PAR  completion:  15  mins  28  secs 
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PAR  done. 

bitgen  reg16.ncd  -I  -w  -f  bftgen.ut 

Loading  device  database  for  application  Bitgen  from  file  ”reg16.ncd”. 

”reg16”  is  an  NCD,  version  2.27,  device  xc4020e,  package  hq208,  speed  -3 
Loading  device  for  application  Bitgen  from  file  '4020e.nph’  in  environment 
d:/xilinx. 


BITGEN:  Xilinx  Bitstream  Generator  M1 .3.7 
Copyright  (c)  1995-1997  Xilinx,  Inc.  All  rights  reserved. 

Running  DRC. 

DRC  detected  0  errors  and  0  warnings. 

Saving  II  file  in  "regiar 
Creating  bit  map... 

Saving  bit  stream  in  "regIS.bit". 

xcpy  regie.bit  C:\exemplar\work\reg16\reg16.bit 

xcpy  reg16.ll  C;\exemplar\work\reg16\reg16.ll 
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Appendix  D:  Ironwood  Electronics  Adapter  to  IMS 
and  FPGA  Pinouts 
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FPGA  processor  is  busy  computing  the  results  of  the  algorithm. 
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